I'm using LDA with Spark MLlib framework. To determine number of topics, I have try: run LDA model with increase number of topics, then find the best number of topic has maximum value log-likelihood. But if I run again in the same way and the same input data. I have different value of number of topics. So can you help me with two question below:
What should value I must use to determine number of topics: logLikelihood or logPrior
Why does the same LDA parameters and input data generate different topics everytime?
And how do I stabilize the topic generation?
Thanks you very much.
Edit: I found solution by set seed before run LDA, using:
DistributedLDAModel.setSeed(long value)