1

Im fairly new to text analytics in R and I am trying to use stemCompletion.

Here's what I did at first:

#Clean Corpus
# 1. Stripping any extra white space:
corpus <- tm_map(corpus, stripWhitespace)
# 2. Transforming everything to lowercase
corpus <- tm_map(corpus, content_transformer(tolower))
# 3. Removing numbers 
corpus <- tm_map(corpus, removeNumbers)
# 4. Removing punctuation
corpus <- tm_map(corpus, removePunctuation, preserve_intra_word_contractions=FALSE)
# 5. Removing stop words
corpus <- tm_map(corpus, removeWords, stopwords("english"))
# 6. Stem words
corpusStem <- tm_map(corpus, stemDocument, language="english")

I then ran this line for stemCompletion and it didnt actually do anything:

corpusStem <- tm_map(corpusStem, stemCompletion, dictionary=corpus, type="shortest")

I read up on stemCompletion and learned that it needs to be done on each individual word. I saw this code on another thread SOF?48022087:

stemCompletion_mod <- function(x,dict=dictCorpus) {
  PlainTextDocument(stripWhitespace(paste(stemCompletion(unlist(strsplit(as.character(x)," ")),dictionary=dict, type="shortest"),sep="", collapse=" ")))
}'

I edited the above with my corpus names, but, when I ran the stemCompletion_mod, I got an error: stemCompletion_mod(corpusStem,corpus)

Error in grep(sprintf("^%s", w), dictionary, value = TRUE) : invalid regular expression, reason 'Missing ')''

What is causing this error? (I also posted on the original thread where I found that code, but its quite old, so seeing if anyone else has some insight here!)

Thanks so much!

Here is the CSV that threw the error.

structure(list(Type = c("Example 1", "Example 2"), Comment = c("This    is an example for a corpus. Words like business and charge are not stemming correctly.", 
"Here is another example. Challenge and always also need to have stemCompletion."
 )), class = "data.frame", row.names = c(NA, -2L))

enter image description here

Chris
  • 1,647
  • 1
  • 18
  • 25
Sammie
  • 141
  • 6
  • Can you give a reproducible example and include all the libraries which are needed to run the code. – Ronak Shah Apr 07 '20 at 09:32
  • Not sure how to add a table here, but I put in a picture, its just two rows. The packages I have for all my text analytics are: library(tm), library(SentimentAnalysis), library(syuzhet), library(knitr), library(stringr), library(tidytext), library(tidyverse) – Sammie Apr 07 '20 at 10:06
  • Suggest you look at solution provided by @daroczig [what might really work](https://stackoverflow.com/questions/25206049/stemcompletion-is-not-working/25391686#25391686) – Chris Apr 08 '20 at 22:48

0 Answers0