1

I have the following code:

library(tm)
text<-readLines("anyText.txt")
corpus<-Corpus(VectorSource(text))
corpus<-tm_map(corpus,content_transformer(tolower))
inspect(corpus)
corpus<-tm_map(corpus,removePunctuation)
stopwords<-c(stopwords('english'),"available","via")
corpus<-tm_map(corpus,removeWords,stopwords)
tempCorpus<-corpus
inspect(tempCorpus)
library(ctv)
library(SnowballC)
corpus<-tm_map(corpus,stemDocument)
inspect(corpus)
corpusT<- tm_map(corpus, PlainTextDocument)
corpusT<-tm_map(corpusT,stemCompletion,dictionary=tempCorpus)
dtm<-TermDocumentMatrix(corpusT,control=list(minWordLength=1))

but I got the error:

   Error: inherits(doc, "TextDocument") is not TRUE

I found that when I comment the line:

corpusT<-tm_map(corpusT,stemCompletion,dictionary=tempCorpus)

the program works fine, but for what I know that last line is for updating the list of steem words with the ones I got in tempCorpus so I need it.

How can I correct that error?

Little
  • 3,363
  • 10
  • 45
  • 74
  • Possible duplicate of [DocumentTermMatrix error on Corpus argument](http://stackoverflow.com/questions/24191728/documenttermmatrix-error-on-corpus-argument) – Hardik Gupta Jan 12 '17 at 12:31

2 Answers2

2

The code corpusT<-tm_map(corpusT,stemCompletion,dictionary=tempCorpus) is not returning the TextDocuments so you are getting an error.

Adding the code corpusT<- tm_map(corpusT, PlainTextDocument) before creating dtm and after stemCompletion should fix the issue.

Your last part of the code should look as below:

inspect(corpus)
corpusT<-tm_map(corpus,stemCompletion,dictionary=tempCorpus)
corpusT<- tm_map(corpusT, PlainTextDocument)
dtm<-TermDocumentMatrix(corpusT,control=list(minWordLength=1))

For additional info please refer, https://stackoverflow.com/a/24206825/3858156

Community
  • 1
  • 1
Manohar Swamynathan
  • 2,065
  • 21
  • 23
0

I have found that I need to wrap all functions passed into tm_map in content_transformer, as you did with tolower. Then the error goes away.

dnuttle
  • 3,810
  • 2
  • 19
  • 19
  • Not suggesting something like: `corpusT <- tm_map(corpusT, content_transformer(PlainTextDocument))`. That doesn't work. – CodeMonkey Jul 26 '16 at 16:42