stemCompletion error using r tm package

Question

I'm using the tm package in r. Everything works properly until I include the stemCompletion. I'm getting the following error:

Error in grep(sprintf("^%s", w), dictionary, value = TRUE) : 
  invalid regular expression

My code is as follows:

path = '~/Interviews/Transcripts/'
file.names <- dir(path, pattern = '.txt')

corpus = lapply(seq_along(file.names), function(index) {
    fileName = file.names[index]
    filePath = paste(path, fileName, sep = '')
    transcript = readChar(filePath, file.info(filePath)$size)
    transcript <- gsub("[’‘^]", '', transcript)

    corpusName = paste('transcript', index, sep = "_")

    c <- Corpus(VectorSource(transcript))
    DublinCore(c[[1]], 'Identifier') <- paste(index, fileName, sep ='_')
    meta(c, type = 'corpus')

    c <- tm_map(c, stripWhitespace)
    c <- tm_map(c, content_transformer(tolower))
    c <- tm_map(c, removeWords, c(stopwords("english"), 'yeah', 'yep'))
    c <- tm_map(c, removePunctuation)
    c <- tm_map(c, stemDocument)
    c <- tm_map(c, stemCompletion, c)
    c <- tm_map(c, PlainTextDocument)
    c
})

This is not reproducible. Good luck finding someone that will go dig into this. [Here are a few tricks](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a good example. — Roman Luštrik, May 16 '16 at 09:56

lukeA · Answer 1 · 2016-05-16T11:33:23.153

First, in theory you'd probably want to use tm_map(c, content_transformer(stemCompletion), c) because tm_map(c, stemCompletion, c) passes a PlainTextDocument to the argument x of stemCompletion, although it expects a character vector (see ?stemCompletion). Second, there are no stemmend tokens to stem-complete, because you did not do any tokenization (e.g. ?termDocumentMatrix), and your dictionary corpus is already stemmed, so what you are trying might not work this way anyway.

(And 3rd, I second @RomanLuštrik: Please edit your post and make it a minimal reproducible example. This way, readers & others, who witness this error, can follow easily.)

Here's an example:

content(tm_map(Corpus(VectorSource("stem completion has advantages")), stemDocument)[[1]])
# [1] "stem complet has advantag"

stemCompletion(c("complet", "advantag"), Corpus(VectorSource("stem completion has advantages")))
#      complet     advantag 
# "completion" "advantages"

stemCompletion error using r tm package

1 Answers1