0

I'm attempting to use removeWords in the R tm package using the following code:

docs <- tm_map(docs, removeWords, stopwords("english")) 

and I get the following error message:

Error in sort (words, decreasing = TRUE) :
   argument "words" is missing, with no default

All of the other transformations I've attempted on my corpus have worked as intended (tolower, removeNumbers, stripWhitespace, removePunctuation etc...) but I can not get removeWords to work properly, and can not find anything online about this particular error message.

I'd very much appreciate any insight into what might be causing this error.

Edit: My corpus consists of html documents all located in the same folder. The code I'm using to test the removeWords transformation is as follows:

setwd(“C:/folder”)
library(RCurl)
library(XML)
library (tm)
library (SnowballC)
docs <- Corpus(DirSource(“C:/folder”))
docs <- tm_map(docs, removePunctuation)
docs <- tm_map(docs, tolower)
docs <- tm_map(docs, removeNumbers)
docs <- tm_map(docs, removeWords, stopwords(“english”))
ChrisB
  • 1
  • 1
  • 2
  • 2
    Using the built in sample data, this seems to work `data(crude); tm_map(crude, removeWords, stopwords("english"))`. You should provide some sort of [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to make it clear how your situation is different. What you have provided should work. Perhaps provide the version information from `sessionInfo()`. – MrFlick Sep 07 '16 at 16:08
  • Thanks MrFlick - I've edited the original post. – ChrisB Sep 07 '16 at 19:34
  • Well, that doesn't really help with reproducibiolity since it relies on files only on your machine. But i'd guess the problem might be with `tolower`. Try `docs <- tm_map(docs , content_transformer(tolower))`. Also I assume `removePucntuation` is just a typo? – MrFlick Sep 07 '16 at 20:16
  • Still the same error message after using content_transformer(tolower). And yeah, that was just a typo. Regarding a reproducible example, the code has worked for me on simple test data, but the issue pops up when I apply it to the corpus of html documents. – ChrisB Sep 07 '16 at 20:42
  • Well, then that error doesn't make a lot of sense. Maybe try just one document. Try the smallest possible document. Unless the error is reproducible, it's not going to be easy to help you. Maybe include the `traceback()` and verify the value of `class(docs)` before running `removeWords`. Also i assume you are using `tm_map` and not `tm_maps` has you've typed. It's important that the code you share accurately reflects what you are actually running -- that's the whole point! – MrFlick Sep 07 '16 at 20:46

1 Answers1

0

Try adding words to remove words function.

Example:

corpus = tm_map(corpus, removeWords, c("apple", stopwords("english")))
aminography
  • 21,986
  • 13
  • 70
  • 74