A previous post addressed this issue here: Text-mining with the tm-package - word stemming
However I am still running into challenges with the tm package.
My goal is to stem a large corpus of words, however I wish to avoid stemming specific words.
For instance, in the corpus I am looking to stem words to their root form of "indian" (stemmed from "indians", "indianspeak", "indianss", etc). However, stemming also transforms words such as "Indianapolis", and "Indiana" to indian, which I do not want.
The post mentioned above addresses this challenge by substituting unique identifiers for specific words in the corpus, stemming it, and then re-substituting the unique identifiers with the actual words. The approach makes sense, however I am still encountering problems with the meta data when the stemming transformation is applied to the corpus. After doing research, I am finding that tm package v0.6 made it so that you can't operate on simple character values (R-Project no applicable method for 'meta' applied to an object of class "character")
However, the solutions posted are not solving the errors I am encountering.
Starting from the solution in the first link posted, I am still running into errors from step 5:
# Step 5: reverse -> sub the identifier keys with the words you want to retain
corpus.temp[seq_len(length(corpus.temp))] <- lapply(corpus.temp, mgsub, pattern=replace, replacement=retain)
Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character"
In order to move forward with my larger more complex corpus, I would like to understand why this is happening, and if there is a solution.