I'm doing some text analysis using tm_map in R. I run the following code (no errors) to produce a Document Term Matrix of (stemmed and otherwise pre-processed) words.
corpus = Corpus(VectorSource(textVector))
corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, PlainTextDocument)
corpus = tm_map(corpus, removePunctuation)
corpus = tm_map(corpus, removeWords, c(stopwords("english")))
corpus = tm_map(corpus, stemDocument, language="english")
dtm = DocumentTermMatrix(corpus)
mostFreqTerms = findFreqTerms(dtm, lowfreq=125)
But when I look at my (stemmed) mostFreqTerms, I see a couple that make me think, "hm, what words were stemmed to produce that?" Also, there may be stem words that make sense to me at first glance, but maybe I'm missing the fact that they actually contain words with different meanings.
I'd like to apply the strategy/technique described in this SO answer on retaining specific terms during stemming (e.g. keeping "natural" and "naturalized" from becoming the same stemmed term. Text-mining with the tm-package - word stemming
But to do so most comprehensively, I'd like to see a list of all the separate words that mapped to my most frequent stem words. Is there a way to find the words that, when stemmed, produced my list of mostFreqTerms?
EDIT: REPRODUCIBLE EXAMPLE
textVector = c("Trisha Takinawa: Here comes Mayor Adam West
himself. Mr. West do you have any words
for our viewers?Mayor Adam West: Box toaster
aluminum maple syrup... no I take that one
back. Im gonna hold onto that one.
Now MaxPower is adding adamant
so this example works")
corpus = Corpus(VectorSource(textVector))
corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, PlainTextDocument)
corpus = tm_map(corpus, removePunctuation)
corpus = tm_map(corpus, removeWords, c(stopwords("english")))
corpus = tm_map(corpus, stemDocument, language="english")
dtm = DocumentTermMatrix(corpus)
mostFreqTerms = findFreqTerms(dtm, lowfreq=2)
mostFreqTerms
...The above mostFreqTerms outputs
[1] "adam" "one" "west"
I'm looking for a programmatic way to determine that the stem word "adam" came from original words "adam" and "adamant".