I am working on a LDA model with textmineR
, have calculated coherence, log-likelihood measures and optimized my model.
As a last step I would like to see how well the model predicts topics on unseen data. Thus, I am using the predict()
function from the textminer package in combination with GIBBS sampling on my testset-sample.
This results in predicted "Theta" values for each document in my testset-sample.
While I have read in another post that perplexity-calculations are not available with the texminer
package (See this post here: How do i measure perplexity scores on a LDA model made with the textmineR package in R?), I am now wondering what the purpose of the prediction function is then for? Especially with a large dataset of over 100.000 Documents it is hard to just visually assess whether the prediction has performed well or not.
I do not want to use perplexity for model selection (I am using coherence/log-likelihood instead), but as far as I understand, perplexity would help me to understand how well the prediction is and how "surprised" the model is with new, previously unseen data.
Since this does not seem to be available for textmineR, I am not sure how to assess the model prediction. Is there anything else that I could use to measure the prediction quality of my textminer
model?
Thank you!