3

I'm using LDA with Spark MLlib framework. To determine number of topics, I have try: run LDA model with increase number of topics, then find the best number of topic has maximum value log-likelihood. But if I run again in the same way and the same input data. I have different value of number of topics. So can you help me with two question below:

What should value I must use to determine number of topics: logLikelihood or logPrior

Why does the same LDA parameters and input data generate different topics everytime?

And how do I stabilize the topic generation?

Thanks you very much.

Edit: I found solution by set seed before run LDA, using:

DistributedLDAModel.setSeed(long value)
  • Can you please show the code you're using to fit your model? In particular, I'd like to know whether you're using `EMLDAOptimizer` or `OnlineLDAOptimizer`. – Jason Scott Lenderman Feb 01 '16 at 16:01
  • In currently, I'm using logLikelihood value to determine what is the best value of `number of topic`. – Thanh Thai Nguyen Aug 30 '16 at 09:06
  • Check this link : https://stackoverflow.com/questions/15067734/lda-model-generates-different-topics-everytime-i-train-on-the-same-corpus?rq=1 – lil-wolf Feb 11 '19 at 05:35

1 Answers1

0

You see this because LDA uses randomness in both training and inference steps. Try setting the same seed every time.

Havnar
  • 2,558
  • 7
  • 33
  • 62