The State of the Union address happened tonight, and the White House released the speech on Medium. I created a wordcloud of the most popular words that were written in the prepared remarks. "New" and "America" are the two most popular words used. You can see that "Americans", and "American" are very popular, too. I chose to leave all the different variations of those words in the wordcloud so it was that much more apparent how popular certain words are.
Here is the wordcloud:
And here is the code (written in R):
library(tm)
library(wordcloud)
SOTU <- read.table('SOTU.txt', sep="\t")
mycorpus<-Corpus(VectorSource(SOTU))
mycorpus<-tm_map(mycorpus, content_transformer(tolower))
mycorpus<-tm_map(mycorpus, removeNumbers)
mycorpus<-tm_map(mycorpus, removePunctuation)
mycorpus<-tm_map(mycorpus, stripWhitespace)
mycorpus<-tm_map(mycorpus, removeWords, c("will", "what", "whats", "was", "way", "too", "that"))
mycorpus<-tm_map(mycorpus, removeWords, stopwords("english"))
dtm<-DocumentTermMatrix(mycorpus)
inspect(dtm)
freq<-colSums(as.matrix(dtm))
findFreqTerms(dtm, lowfreq = 3)
freq<-sort(colSums(as.matrix(dtm)), decreasing = TRUE)
plot(freq)
wf <- data.frame(word=names(freq), freq=freq)
library(wordcloud)
set.seed(123)
wordcloud(names(freq), freq, min.freq = 5, max.words=500, scale=c(5,.1), colors = c("red", "blue"))

No comments:
Post a Comment