3

I am analyzing the Reuters 21578 corpus, all the Reuters news articles from 1987, using the "tm" package. After importing the XML files into an R data file, I clean the text--convert to plaintext, convert to lwer case, remove stop words etc. (as seen below)--then I try to convert the corpus to a document term matrix but receive an error message:

Error in UseMethod("Content", x) : no applicable method for 'Content' applied to an object of class "character"

All pre-processing steps work correctly up until document term matrix.

I created a non-random subset of the corpus (with 4000 documents) and the document term matrix command works fine on that.

My code is below. Thanks for the help.

##Import
file <- "reut-full.xml" 
reuters <- Corpus(ReutersSource(file), readerControl = list(reader = readReut21578XML))

## Convert to Plain Text Documents
reuters <- tm_map(reuters, as.PlainTextDocument)

## Convert to Lower Case
reuters <- tm_map(reuters, tolower)

## Remove Stopwords
reuters <- tm_map(reuters, removeWords, stopwords("english"))

## Remove Punctuations
reuters <- tm_map(reuters, removePunctuation)

## Stemming
reuters <- tm_map(reuters, stemDocument)

## Remove Numbers
reuters <- tm_map(reuters, removeNumbers)

## Eliminating Extra White Spaces
reuters <- tm_map(reuters, stripWhitespace)

## create a term document matrix
dtm <- DocumentTermMatrix(reuters)

Error in UseMethod("Content", x) : 
  no applicable method for 'Content' applied to an object of class "character"
Pop
  • 12,135
  • 5
  • 55
  • 68
Dr. Beeblebrox
  • 838
  • 2
  • 13
  • 30
  • 1
    where did you get the `reut-full.xml` file? – Ben Apr 30 '12 at 01:37
  • 1
    Since the code works with `file <- system.file("texts", "reuters-21578.xml", package = "tm")`, there is indeed a problem with your XML file. – Vincent Zoonekynd Apr 30 '12 at 02:36
  • Oh, sorry for missing an explanation of that. It was product of brute force. I copy-pasted the original files into one XML file in a text editor. I knew there must exist a more elegant way, but I thought that finding that method would take longer than the 5 minutes it took to copy-paste. – Dr. Beeblebrox Apr 30 '12 at 02:46
  • The original data came in 21 XML files, each of the first 20 containing 1,000 articles and the 21st containing 578 articles. – Dr. Beeblebrox Apr 30 '12 at 02:47
  • I thought that I might have caused some flaw in the XML when I combined the XML files. But when I subsetted to just 4,000 articles, and the dtm command ran, I was simply stumped. – Dr. Beeblebrox Apr 30 '12 at 02:55
  • Vincent, I'll try your suggestion tomorrow when get back to the lab. Thanks! – Dr. Beeblebrox Apr 30 '12 at 03:03
  • That is only a sample dataset from the `tm` package (10 documents). For the whole dataset, you may want to check: http://www.rinfinance.com/agenda/2010/Theussl+Feinerer+Hornik.pdf – Vincent Zoonekynd Apr 30 '12 at 08:31
  • Did you ever find the solution to your issue? I'm getting the same error. – Shayan Nov 16 '21 at 18:32
  • Also similar question: https://stackoverflow.com/questions/45514472/no-applicable-method-for-tm-map-applied-to-an-object-of-class-character – Shayan Nov 16 '21 at 20:57

0 Answers0