2

I was followed the instruction from here

In slide no. 9 tolower has issue in package tm 0.6 and above I have used

myCorpus <- tm_map(myCorpus, content_transformer(tolower)

it was duplicate from this stackoverflow but i still get error when run stemCompletion

myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)

And I follow this instruction for both variable myCorpus and myCorpusCopy to PlainTextDocument

corpus <- tm_map(corpus, PlainTextDocument)

I was able to execute

myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)

but I get 50 warnings

There were 50 or more warnings (use warnings() to see the first 50) warnings()

and I get all 50 warnings:

1: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 2: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 3: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 4: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 5: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 6: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 7: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 8: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 9: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used 10: In grep(sprintf("^%s", w), dictionary, value = TRUE) : argument 'pattern' has length > 1 and only the first element will be used

I try to ignore the warnings and create TermDocumentMatrix()

tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1,   
Inf)))

and I get error:

Error: inherits(doc, "TextDocument") is not TRUE
Community
  • 1
  • 1
wesleylim1993
  • 51
  • 2
  • 2
  • 8
  • @lukeA stemCompetion work as complete the words example `exampl call java code r` after `stemCompetion` it would become `examples call java code r` – wesleylim1993 May 19 '15 at 10:15
  • @lukeA sorry i dun understand where to put the `cbind(rownames(tdm), stemCompletion(rownames(tdm), myCorpus))` @@ – wesleylim1993 May 19 '15 at 12:48
  • Please create a reproducible example as described here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – lukeA May 19 '15 at 13:23

1 Answers1

2

Here's how you can create a stemmed term-document-matrix and re-complete the stemmed tokens afterwards:

txt <- " was followed the instruction from here In slide no. 9 tolower has issue in package tm 0.6 and above I have used "
myCorpus <- Corpus(VectorSource(txt))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
tdm <- TermDocumentMatrix(myCorpus, control = list(stemming = TRUE)) 
cbind(stems = rownames(tdm), completed = stemCompletion(rownames(tdm), myCorpus))  
#          stems      completed    
# 0.6      "0.6"      "0.6"        
# abov     "abov"     "above"      
# and      "and"      "and"        
# follow   "follow"   "followed"   
# from     "from"     "from"       
# has      "has"      "has"        
# have     "have"     "have"       
# here     "here"     "here"       
# instruct "instruct" "instruction"
# issu     "issu"     "issue"      
# no.      "no."      "no."        
# packag   "packag"   "package"    
# slide    "slide"    "slide"      
# the      "the"      "the"        
# tolow    "tolow"    "tolower"    
# use      "use"      "used"       
# was      "was"      "was"    
lukeA
  • 53,097
  • 5
  • 97
  • 100
  • execute separately or together? `tdm <- TermDocumentMatrix(myCorpus, control = list(stemming = TRUE))` than execute `cbind(stems = rownames(tdm), completed = stemCompletion(rownames(tdm), myCorpus))` or execute once both – wesleylim1993 May 23 '15 at 04:47
  • Obviously, you'll need `tdm` in the 2nd command, which is created in the 1st by using `tdm <- ...`. So doing the 2nd without having done the 1st would through an error because there's no `tdm`. – lukeA May 23 '15 at 08:45