The same LDA parameters and data input, but I have different topics everytime?

Question

I'm using LDA with Spark MLlib framework. To determine number of topics, I have try: run LDA model with increase number of topics, then find the best number of topic has maximum value log-likelihood. But if I run again in the same way and the same input data. I have different value of number of topics. So can you help me with two question below:

What should value I must use to determine number of topics: logLikelihood or logPrior

Why does the same LDA parameters and input data generate different topics everytime?

And how do I stabilize the topic generation?

Thanks you very much.

Edit: I found solution by set seed before run LDA, using:

DistributedLDAModel.setSeed(long value)

Can you please show the code you're using to fit your model? In particular, I'd like to know whether you're using `EMLDAOptimizer` or `OnlineLDAOptimizer`. — Jason Scott Lenderman, Feb 01 '16 at 16:01
In currently, I'm using logLikelihood value to determine what is the best value of `number of topic`. — Thanh Thai Nguyen, Aug 30 '16 at 09:06
Check this link : https://stackoverflow.com/questions/15067734/lda-model-generates-different-topics-everytime-i-train-on-the-same-corpus?rq=1 — lil-wolf, Feb 11 '19 at 05:35

score 0 · Accepted Answer · answered Jul 18 '17 at 09:53

0

You see this because LDA uses randomness in both training and inference steps. Try setting the same seed every time.

answered Jul 18 '17 at 09:53

Havnar

2,558
7
33
62

The same LDA parameters and data input, but I have different topics everytime?

1 Answers1