I'm using Spark's LDA implementation, as shown in example code here. I want to get consistent topics/topic distributions for my training data. I'm training on two machines and would like the output to be the same.
I understand that LDA uses a random component for training/inference, it's explained in this SO post. It looks like consistent results can be achieved in python gensim by setting the seed value manually. I've tried this in Spark but I am still getting slight variance in my outputted topic distributions.
ldaParams: LDA = new LDA().setK(10).setMaxIterations(60).setSeed(10L)
val distributedLDAModel: DistributedLDAModel = ldaParams.run(corpusInfo.docTermVectors).asInstanceOf[DistributedLDAModel]
val topicDistributions: Map[Long,Vector] = distributedLDAModel.topicDistributions.collect.toMap //produces different results on each run
Is there a way I can get consistent topic distributions for my training set of data?