I have a simple code to perform text analytics. Before creating the DTM, I am applyting stemCompletion. However, the output of this is something which I am not understanding, whether I am doing it wrong, or this is the only way it behaves.
I have referred this link of rmy help: text-mining-with-the-tm-package-word-stemming
The issue that I see here is that after stemming, my DTm shrinks and doesn't return the tokens at all (returns 'content' 'meta')
My code and Outputs:
texts <- c("i am member of the XYZ association",
"apply for our open associate position",
"xyz memorial lecture takes place on wednesday",
"vote for the most popular lecturer")
myCorpus <- Corpus(VectorSource(texts))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
myCorpus <- tm_map(myCorpus, removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
myCorpus <- tm_map(myCorpus, content_transformer(removeURL)) #??
myCorpusCopy <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
for (i in 1:4) {
cat(paste("[[", i, "]] ", sep = ""))
writeLines(as.character(myCorpus[[i]]))
}
Output:
[[1]] i am member of the xyz associ
[[2]] appli for our open associ posit
[[3]] xyz memori lectur take place on wednesday
[[4]] vote for the most popular lectur
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
for (i in 1:4) {
cat(paste("[[", i, "]] ", sep = ""))
writeLines(as.character(myCorpus[[i]]))
}
Output:
[[1]] content
meta
[[2]] content
meta
[[3]] content
meta
[[4]] content
meta
myCorpus <- tm_map(myCorpus, PlainTextDocument)
dtm <- DocumentTermMatrix(myCorpus, control = list(weighting = weightTf))
dtm
inspect(dtm)
Output:
> inspect(dtm)
<<DocumentTermMatrix (documents: 4, terms: 2)>>
Non-/sparse entries: 8/0
Sparsity : 0%
Maximal term length: 7
Weighting : term frequency (tf)
Terms
Docs content meta
character(0) 1 1
character(0) 1 1
character(0) 1 1
character(0) 1 1
Expected output: To successfully run stemming (both stemdocument and stemcompletion). I am using tm 0.6 package