1

I have a problem modifying the english.dat stopword file from R's TM package. Anything I add to it is unrecognized. I tried adding at the beginning of the file, the middle, the end, still nothing works. Only the original text of the file is recognized. I tried saving the file as ASCI, UTF, UTF-8, to no avail.

Any ideas?

thanks

animalcroc
  • 283
  • 4
  • 13
  • same question with this [adding stopword in mn package](http://stackoverflow.com/questions/18446408/adding-stopwords-in-r-tm) – rischan Jun 02 '14 at 15:45
  • as i mentioned in another response, that is impractical if you have a large file of stop words to add – animalcroc Jun 02 '14 at 16:33
  • 1
    Please explain how this is impractical. What format are the additional words in? `c` works with many vectors so if you use it and your additional words are in a vector it does the job and this is a duplicate question. – Tyler Rinker Jun 02 '14 at 17:06
  • 1
    @animalcroc as Rinker said you can load your stopword to the vector, like this => load your stopwords to the `mystopwords` varibale and then `myCorpus <- tm_map(myCorpus, removeWords, c(stopwords("english"),mystopwords))` – rischan Jun 03 '14 at 00:11
  • I have a list of perhaps 3000 words in a text file... The issue I'm facing here must be a bug in R. Very strange that the TM package can't read text I type in. – animalcroc Jun 03 '14 at 12:40
  • it was easier than i thought to do this. I simply used R's scan() function to read my stopwords file into a vector, which I then concatenated – animalcroc Jun 03 '14 at 15:43

1 Answers1

6

Try adding them this way, as a concatenation to the "english" list:

myStopwords <- c(stopwords('english'), "available", "via") to add words
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
agstudy
  • 119,832
  • 17
  • 199
  • 261
lawyeR
  • 7,488
  • 5
  • 33
  • 63
  • thanks, but I have a large list of words to add and this would be impractical – animalcroc Jun 02 '14 at 16:32
  • it was easier than i thought to do this. I simply used R's scan() function to read my stopwords file into a vector, which I then concatenated – animalcroc Jun 03 '14 at 15:42