Text mining using R to count frequency of words

Question

I want to count the occurrence of the word "uncertainty" but only if "economic policy" or "legislation" or words pertaining to policies appear in the same text. Right now, I have come out with a code in R to count the frequency of all words in the text, but it does not discern if the words counted occur in the right context. Do you have any suggestions how to rectify this?

library(tm) #load text mining library
setwd('D:/3_MTICorpus') #sets R's working directory to near where my files are
ae.corpus<-Corpus(DirSource("D:/3_MTICorpus"),readerControl=list(reader=readPlain))
summary(ae.corpus) #check what went in
ae.corpus <- tm_map(ae.corpus, tolower)
ae.corpus <- tm_map(ae.corpus, removePunctuation)
ae.corpus <- tm_map(ae.corpus, removeNumbers)
myStopwords <- c(stopwords('english'), "available", "via")
ae.corpus <- tm_map(ae.corpus, removeWords, myStopwords) # this stopword file is at C:\Users\[username]\Documents\R\win-library\2.13\tm\stopwords 
#library(SnowballC)
#ae.corpus <- tm_map(ae.corpus, stemDocument)

ae.tdm <- DocumentTermMatrix(ae.corpus, control = list(minWordLength = 3))
inspect(ae.tdm)
findFreqTerms(ae.tdm, lowfreq=2)
findAssocs(ae.tdm, "economic",.7)
d<- Dictionary (c("economic", "uncertainty", "policy"))
inspect(DocumentTermMatrix(ae.corpus, list(dictionary = d)))

Couldn't update http://stackoverflow.com/questions/20673143/text-mining-counting-word-occurences-in-r ? — Freddy, Dec 19 '13 at 08:12
How do you define 'same text'? A sentence, paragraph, book, file? Also, you could download a small part of 'D:/correctdirectory' to PasteBin and make the example reproducible meaning that anyone could run the code and try find an answer for you: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Mikko, Dec 19 '13 at 08:25
@Freddy Thanks for the tips! "Same text" means same newspaper article, which I will differentiate from each other by making one article, one paragraph. — stochastiq, Dec 19 '13 at 09:19

score 0 · Answer 1 · answered Dec 19 '13 at 08:28

You can transform your term-document matrix to matrix with 0/1 values

dtm$v[dtm$v > 0] <- 1

dtm <- as.matrix(dtm)

and then you can easily use table

table(tdm[which(rownames(tdm)=='uncertainty'),], tdm[which(rownames(tdm)=='economic_policy'),])

which should produce something like this:

     0  1
  0 105  13
  1  7  5

Text mining using R to count frequency of words

1 Answers1