Wednesday, February 4, 2015

Data visualizations based on Spotify listens

I've been tracking the music I listen to on Spotify since early November. There is an IFTTT recipe that automatically updates a Google Spreadsheet every time I listen to a song with the song name, artist name, album title, and timestamp.

While other people might be interested in the type of music and how much of it I listen to, I am likely the main audience for this visualization. No matter if the audience is just me, or family members, friends, classmates, or others, some might generate assumptions about what types of music I listen to on a regular bases. People who know I have a daughter would understand the kids songs in the treemap, those who don't know about my daughter might find the kids songs strange. The audience also probably has some assumptions about how they listen to music throughout the day, and may compare my listening habits to theirs. Someone who needs silence while working will not understand how I listen to so much music. People may also judge the types of music I listen to, and think better or worse of me.

My questions are: Which artists do I listen to the most, and do I like a lot of those artists' songs or are they one-hit wonders to me? What day of the week do I listen to the most music? What time of day do I listen to the most music?

There are several other questions I have, but don't have the proper data currently. Those questions are: Do I listen to certain genres of music on specific days? I'm guessing that music on Mondays is different than music on Fridays. How do the number of meetings I have during the day affect how many songs, and what types of songs, I listen to? Again, guessing that more meetings might mean fewer songs, but that the genres would change (since many meetings tend to put me in a bad mood, especially when they are poorly run). To make these analyses work, I would need to manually input genres for each song, and would need to go through my calendar and manually update the number of meetings I have each day.

I'm using R for the data preparation and analysis, and Illustrator to fine tune the images.
For the treemap I used the R package 'portfolio'. Flowingdata.com helped with the steps I needed to take to make this work. For the bar graph, I used barplot. For the line graph, I used ggplot.

I'm also including my initial sketches of how I wanted the visualization to look. This helped a lot for planning my attack, especially with regard to the data cleanup.

Visualizations are below, all code is on GitHub

Sketch of what I wanted:

Treemap of artists I listen to most:
Songs listened to per hour:


Songs listened to by day:


Tuesday, January 20, 2015

State of the Union Address in Wordcloud form

The State of the Union address happened tonight, and the White House released the speech on Medium. I created a wordcloud of the most popular words that were written in the prepared remarks. "New" and "America" are the two most popular words used. You can see that "Americans", and "American" are very popular, too. I chose to leave all the different variations of those words in the wordcloud so it was that much more apparent how popular certain words are.

Here is the wordcloud:
And here is the code (written in R):
library(tm)
library(wordcloud)
SOTU <- read.table('SOTU.txt', sep="\t")
mycorpus<-Corpus(VectorSource(SOTU))
mycorpus<-tm_map(mycorpus, content_transformer(tolower))
mycorpus<-tm_map(mycorpus, removeNumbers)
mycorpus<-tm_map(mycorpus, removePunctuation)
mycorpus<-tm_map(mycorpus, stripWhitespace)
mycorpus<-tm_map(mycorpus, removeWords, c("will", "what", "whats", "was", "way", "too", "that"))
mycorpus<-tm_map(mycorpus, removeWords, stopwords("english"))
dtm<-DocumentTermMatrix(mycorpus)
inspect(dtm)
freq<-colSums(as.matrix(dtm))
findFreqTerms(dtm, lowfreq = 3)
freq<-sort(colSums(as.matrix(dtm)), decreasing = TRUE)
plot(freq)
wf <- data.frame(word=names(freq), freq=freq)
library(wordcloud)
set.seed(123)
wordcloud(names(freq), freq, min.freq = 5, max.words=500, scale=c(5,.1), colors = c("red", "blue"))

Saturday, January 3, 2015

World Map built in R

This is a quick map I built in R using the world.cities dataset that comes built into the "maps" package. I plotted the location of every city around the world and colored them orange. Then I broke out the dataset to include only cities with more than 1,000,000 in population, plotted those cities, and colored them black. There are a couple things I really like about this visualization. First, you can pretty clearly see the borders of continents. It is clear that cities were able to grow in part because of their proximity to water. It is also interesting to see where the gaps in cities are. The Sahara Desert has very few cities as does much of Western China. Europe's entire land mass is basically covered by cities, while Australia has very few.

None of this is revolutionary, but interesting to look at nonetheless. I've included the simple code on GitHub here: https://github.com/samedelstein/World-Maps-in-R