I am doing text mining on large data set. I was able to create TDM and DTM and was able to perform my analysis using TDF & IDF. But can we create a Term Document Matrix or Document Term Matrix for Bi Grams in R? I know similar facility is available in Mahout but I am looking for a way to do this in R?
Asked
Active
Viewed 375 times
1 Answers
1
Following Code Worked for me:
BigramTokenizer <- function(x) {RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 2, max = 2))}
myTdm <- TermDocumentMatrix(myCorpus, control = list(tokenize = BigramTokenizer))

mbatchkarov
- 15,487
- 9
- 60
- 79

Tanveer
- 890
- 12
- 22