Tuesday, January 20, 2015

State of the Union Address in Wordcloud form

The State of the Union address happened tonight, and the White House released the speech on Medium. I created a wordcloud of the most popular words that were written in the prepared remarks. "New" and "America" are the two most popular words used. You can see that "Americans", and "American" are very popular, too. I chose to leave all the different variations of those words in the wordcloud so it was that much more apparent how popular certain words are.

Here is the wordcloud:
And here is the code (written in R):
library(tm)
library(wordcloud)
SOTU <- read.table('SOTU.txt', sep="\t")
mycorpus<-Corpus(VectorSource(SOTU))
mycorpus<-tm_map(mycorpus, content_transformer(tolower))
mycorpus<-tm_map(mycorpus, removeNumbers)
mycorpus<-tm_map(mycorpus, removePunctuation)
mycorpus<-tm_map(mycorpus, stripWhitespace)
mycorpus<-tm_map(mycorpus, removeWords, c("will", "what", "whats", "was", "way", "too", "that"))
mycorpus<-tm_map(mycorpus, removeWords, stopwords("english"))
dtm<-DocumentTermMatrix(mycorpus)
inspect(dtm)
freq<-colSums(as.matrix(dtm))
findFreqTerms(dtm, lowfreq = 3)
freq<-sort(colSums(as.matrix(dtm)), decreasing = TRUE)
plot(freq)
wf <- data.frame(word=names(freq), freq=freq)
library(wordcloud)
set.seed(123)
wordcloud(names(freq), freq, min.freq = 5, max.words=500, scale=c(5,.1), colors = c("red", "blue"))

Saturday, January 3, 2015

World Map built in R

This is a quick map I built in R using the world.cities dataset that comes built into the "maps" package. I plotted the location of every city around the world and colored them orange. Then I broke out the dataset to include only cities with more than 1,000,000 in population, plotted those cities, and colored them black. There are a couple things I really like about this visualization. First, you can pretty clearly see the borders of continents. It is clear that cities were able to grow in part because of their proximity to water. It is also interesting to see where the gaps in cities are. The Sahara Desert has very few cities as does much of Western China. Europe's entire land mass is basically covered by cities, while Australia has very few.

None of this is revolutionary, but interesting to look at nonetheless. I've included the simple code on GitHub here: https://github.com/samedelstein/World-Maps-in-R