I'm using LDA for topic modeling:
dtm <- DocumentTermMatrix(docs)
However, there are rows that all elements in dtm
are zero. So I followed the instruction in here
ui = unique(dtm$i)
dtm.new = dtm[ui,]
And, then LDA works and I have the topics and everything. My next attempt is to use LDAvis as recommended in here. Source code:
topicmodels_json_ldavis <- function(fitted, corpus, doc_term){
# Required packages
library(topicmodels)
library(dplyr)
library(stringi)
library(tm)
library(LDAvis)
# Find required quantities
phi <- posterior(fitted)$terms %>% as.matrix
theta <- posterior(fitted)$topics %>% as.matrix
vocab <- colnames(phi)
doc_length <- vector()
for (i in 1:length(corpus)) {
temp <- paste(corpus[[i]]$content, collapse = ' ')
doc_length <- c(doc_length, stri_count(temp, regex = '\\S+'))
}
temp_frequency <- inspect(doc_term)
freq_matrix <- data.frame(ST = colnames(temp_frequency),
Freq = colSums(temp_frequency))
rm(temp_frequency)
# Convert to json
json_lda <- LDAvis::createJSON(phi = phi, theta = theta,
vocab = vocab,
doc.length = doc_length,
term.frequency = freq_matrix$Freq)
return(json_lda)
}
When I call topicmodels_json_ldavis
function, I receive this error:
Length of doc.length not equal to the number of rows in theta;
both should be equal to the number of documents in the data.
I checked the length of theta
and doc.length
. They are different. I assume because I pass the corpus (docs
) which makes a dtm
with (at least) a zero row. In order for the corpus to match with doc_term_matrix, I decided to make a new corpus from dtm.new
as suggested in here. Source code:
dtm2list <- apply(dtm, 1, function(x) {
paste(rep(names(x), x), collapse=" ")
})
myCorp <- VCorpus(VectorSource(dtm2list))
I even made a new ldaOut with dtm.new and passed the following parameters to topicmodels_json_ldavis
: ldaOut22, myCorp, dtm.new
I still receive the error message that theta
and doc.length
must have the same length.