I'm trying to use LDA model from topicmodels package in R. I need to measure method's instability so I have generated true parameters from the Dirichlet distribution for w = 3000 words, t = 8 topics and d = 50 documents with approximately 60 words in each one:
Theta = t(rdirichlet(d, alpha))
Phi = t(rdirichlet(t, beta))
docs = matrix(0, nrow = d, ncol = w)
for (i in 1:d) {
curn = rnorm(1, mean = 60, sd = 10)
for (j in 1:curn) {
curt = rdiscrete(1, Theta[,d], 1:t)
curw = rdiscrete(1, Phi[,curt], 1:w)
docs[i, curw] = docs[i, curw] + 1
}
}
So my docs matrix is a sparse matrix d * w and almost all elements are 0 or 1.
Then I need my docs matrix to be an object of the DocumentTermMatrix class to use it in topicmodels:lda():
docs = as.DocumentTermMatrix(docs, weighting = weightTf)
I need to use Gibbs sampling method, so I write
ldafitmodel <- lda(docs, t, method = "Gibbs")
And then I get:
Error in lda.default(docs, t, method = "Gibbs") : nrow(x) and length(grouping) are different
I guess this topicmodels package uses MASS package, but then this grouping parameter is something I can't control explicitly, can I? Or what do I do wrong with my data?
Please help me!
BR, Maria