3

I'm attempting to do some topic modeling using the R package topicmodels

I've done my pre-processing using the 'tm' package, per these instructions R text file and text mining...how to load data.

However, when I go to run my correlated topic model (CTM) using topicmodels in R, I receive the following error

 "Error in CTM...DocumentTermMatrix needs to have a term frequency weighting. 

I've triple checked the structure of my DocumentTermMatrix shows it does have a frequency weighting:

 A document-term matrix (26 documents, 413 terms)

 Non-/sparse entries: 4804/5934
 Sparsity           : 55%
 Maximal term length: 13 
 Weighting          : term frequency - inverse document frequency (normalized) (tf-idf)

Any suggestions on how to get this working would be appreciated!

Community
  • 1
  • 1
R_Queery
  • 497
  • 1
  • 9
  • 19
  • 3
    Please provide a reproducible example. – agstudy Feb 04 '13 at 22:55
  • My experience with this sort of question is that the questioners are often confusing TermDocumentMatrices with DocumentTermMatrices. Your question certainly suggests that confusion. – IRTFM Feb 05 '13 at 03:30
  • @Dwin Apologies, for the nomenclature flub, it is indeed a DocumentTermMatrix, NOT a TermDocumentMatrix. – R_Queery Feb 05 '13 at 13:36
  • The DTM in 'topicmodels' does not recognize a term frequency weighting that uses TF-IDF, the work around was to use normal term-frequency weighting instead of TF-IDF, not ideal, but previous Blei et al. (2003) suggest that using TF-IDF is not necessary for LDA. – R_Queery Feb 12 '13 at 17:07

1 Answers1

3

You need to specify the weighting parameter to be weightTf if you use the slam package before:

m=as.simple_triplet_matrix(mm);
dtm <- as.DocumentTermMatrix(m,weighting =weightTf)
pudding
  • 76
  • 4