0

I am doing text mining on large data set. I was able to create TDM and DTM and was able to perform my analysis using TDF & IDF. But can we create a Term Document Matrix or Document Term Matrix for Bi Grams in R? I know similar facility is available in Mahout but I am looking for a way to do this in R?

Zong
  • 6,160
  • 5
  • 32
  • 46
Tanveer
  • 890
  • 12
  • 22

1 Answers1

1

Following Code Worked for me:

BigramTokenizer <- function(x) {RWeka::NGramTokenizer(x, RWeka::Weka_control(min = 2, max = 2))}
myTdm <- TermDocumentMatrix(myCorpus, control = list(tokenize = BigramTokenizer))
mbatchkarov
  • 15,487
  • 9
  • 60
  • 79
Tanveer
  • 890
  • 12
  • 22