1

I would like to read a txt file and do some text mining approaches. When I used the tm package in R, I got lots of error messages. For example, If I wanted to correlate the most frequent words, I got only NA's. Here is the code, I have used so far:

library(tm)

doc <- c("word1 word1 word2 word1 word2 word3 word1 word2 word3 word4 word1 word2 word3 word4 word5")

Corpus <- Corpus(VectorSource(doc))
Corpus <- tm_map(Corpus, stripWhitespace)
Corpus <- tm_map(Corpus, tolower)
Corpus <- tm_map(Corpus, removeWords, stopwords("english"))
Corpus <- tm_map(Corpus, removePunctuation)

tdm <- TermDocumentMatrix(Corpus)

#Plotting correlation of Terms
plot(tdm, terms = findFreqTerms(tdm, lowfreq = 2, Inf)[1:3], CorThreshold = 0.1)

After that, I got the following error message:

Error in if (all(from == t(from))) "undirected" else "directed":
missing value where TRUE/FALSE needed

O.k. for investigations, I used the following code which is a step-by-step approach of findAssocs():

terms <- findFreqTerms(tdm, lowfreq = 2)[1:3]
m <- as.matrix(t(tdm[terms,]))
m
cor(m)

However, I got the following output:

          word1 word2 word3
    word1    NA    NA    NA
    word2    NA    NA    NA
    word3    NA    NA    NA

From my point of view, there is something wrong with the text, but I have no explanation for this strange behavior. My questions is, if somebody has got a solution for this problem. My R (2.15.2) is running on a Mac system (x86_64-apple-darwin9.8.0/x86_64 (64-bit)).

Thanks a lot!

  • See [this](http://stackoverflow.com/questions/13575180/how-to-change-the-language-of-errors-in-r) to change your language error to English. – agstudy Apr 24 '13 at 07:49

1 Answers1

0

For the correlation analysis function cor() you got the matrix of NA values because you have only one observation of each variable - you can't do correlation if variables has only one observation.

You can check it by looking on the your matrix m

> m
    Terms
Docs word1 word2 word3
   1     5     4     3
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • O.k. That means, that I have to split up my text into several documents such as one sentence per vector? – user2314393 Apr 24 '13 at 09:36
  • Thats it! Again, thank you very much. I thought, that a correlation within one text would be possible. Unfortunately, this is not the case for tm, isn't it? I got the correlation/association. – user2314393 Apr 24 '13 at 10:27