TermDocumentMatrix errors in R

Question

I have been working through numerous online examples of the {tm} package in R, attempting to create a TermDocumentMatrix. Creating and cleaning a corpus has been pretty straightforward, but I consistently encounter an error when I attempt to create a matrix. The error is:

Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character" In addition: Warning message: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code

For example, here is code from Jon Starkweather's text mining example. Apologies in advance for such long code, but this does produce a reproducible example. Please note that the error comes at the end with the {tdm} function.

#Read in data
policy.HTML.page <- readLines("http://policy.unt.edu/policy/3-5")

#Obtain text and remove mark-up
policy.HTML.page[186:202]
id.1 <- 3 + which(policy.HTML.page == "                    TOTAL UNIVERSITY        </div>")
id.2 <- id.1 + 5
text.data <- policy.HTML.page[id.1:id.2]
td.1 <- gsub(pattern = "<p>", replacement = "", x = text.data, 
     ignore.case = TRUE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

td.2 <- gsub(pattern = "</p>", replacement = "", x = td.1, ignore.case = TRUE,
     perl = FALSE, fixed = FALSE, useBytes = FALSE)

text.d <- td.2; rm(text.data, td.1, td.2)

#Create corpus and clean 
library(tm)
library(SnowballC)
txt <- VectorSource(text.d); rm(text.d)
txt.corpus <- Corpus(txt)
txt.corpus <- tm_map(txt.corpus, tolower)
txt.corpus <- tm_map(txt.corpus, removeNumbers)
txt.corpus <- tm_map(txt.corpus, removePunctuation)
txt.corpus <- tm_map(txt.corpus, removeWords, stopwords("english"))
txt.corpus <- tm_map(txt.corpus, stripWhitespace); #inspect(docs[1])
txt.corpus <- tm_map(txt.corpus, stemDocument)

# NOTE ERROR WHEN CREATING TDM
tdm <- TermDocumentMatrix(txt.corpus)

I have seen this post, and your question reminded me of that. Have a look of [this link](http://stackoverflow.com/questions/24771165/r-project-no-applicable-method-for-meta-applied-to-an-object-of-class-charact). This may be useful. — jazzurro, Aug 28 '14 at 14:56
@jazzurro -- thanks so much for redirecting me to this post! adding content_transformer to the tolower in the tm_map function solved the problem — Brian P, Aug 28 '14 at 15:02
I actually had the same problem and saw that post. I am glad that your script is working now. — jazzurro, Aug 28 '14 at 15:10

score 26 · Accepted Answer · answered Aug 28 '14 at 15:05

26

The link provided by jazzurro points to the solution. The following line of code

 txt.corpus <- tm_map(txt.corpus, tolower)

must be changed to

 txt.corpus <- tm_map(txt.corpus, content_transformer(tolower))

answered Aug 28 '14 at 15:05

Brian P

1,496
4
25
38

score 5 · Answer 2 · edited Apr 16 '15 at 17:01

There are 2 reasons for this issue in tm v0.6.

If you are doing term level transformations like tolower etc., tm_map returns character vector instead of PlainTextDocument.
Solution: Call tolower through content_transformer or call tm_map(corpus, PlainTextDocument) immediately after tolower
If the SnowballC package is not installed and if you are trying to stem the documents then also this can occur.
Solution: install.packages('SnowballC')

score 2 · Answer 3 · edited Apr 17 '17 at 09:38

2

There is No need to apply content_transformer.

Create the corpus in this way:

trainData_corpus <- Corpus((VectorSource(trainData$Comments)))

Try it.

edited Apr 17 '17 at 09:38

zhm

3,513
3
34
55

answered Apr 17 '17 at 05:31

Deepika Sharma

21
1

TermDocumentMatrix errors in R

3 Answers3

Linked