-1

I've created a dictionary with the document-topic probabilities from a Gensim LDA model. Each iteration over the dictionary (even with the same exact code) produces slightly different values. Why is this? (Note, when the same code is copied and pasted in another jupyter cell)

for r in doc_topics[:2]:
    print(r)

First time produces:

[(5, 0.46771166), (8, 0.09964698), (12, 0.08084056), (55, 0.16801219), (58, 0.07947531), (97, 0.04642806)]
[(8, 0.7273078), (69, 0.06939292), (78, 0.062151615), (101, 0.119957164)]

Second run produces:

[(5, 0.47463417), (8, 0.105600394), (12, 0.06531593), (55, 0.16066092), (58, 0.06662597), (97, 0.054465853)]
[(8, 0.7306167), (69, 0.054978732), (78, 0.06831972), (84, 0.025588958), (101, 0.10244013)]

Third:

[(5, 0.4771855), (8, 0.09988891), (12, 0.088423), (55, 0.15682992), (58, 0.058175407), (97, 0.053951494)]
[(8, 0.75193375), (69, 0.059308972), (78, 0.0622621), (84, 0.020040851), (101, 0.09659243)]

And so on...

Dror M
  • 63
  • 8
  • For reproducibility you must specify a random seed in your LDA model. In this way, using the same seed always checked the same results. – Massifox Sep 29 '19 at 08:18
  • How is `doc_topics` created? What's `type(doc_topics)`? Are you sure no other code is being run between two runs of your code? What if you try `print(r); print(r)` instead of one print, or if you repeat your code twice inside a single cell? (You may want to expand your question with these details, for more formatting control, rather than answering in a comment.) – gojomo Sep 30 '19 at 03:57

2 Answers2

0

Because in almost every ml algorithm there is a slight of randomness in bith training and inference steps.

This question has already been asked before so next time you can google it and find an answer quickly (:

LDA model generates different topics everytime i train on the same corpus

Yoel Nisanov
  • 984
  • 7
  • 16
  • Hi, let me explain - I am *not* reproducing the LDA, simply the final lines of code pasted above (ie. AFTER creating a document-topic dictionary, simply running through it, without regenerating it) – Dror M Sep 29 '19 at 12:12
  • You're not regenerating the document-topic dictionary every time? – Yoel Nisanov Sep 29 '19 at 13:08
  • No - simply iterating through the-already-generated dictionary – Dror M Sep 30 '19 at 16:55
0

To achieve reproducibility, you need to specify the random_state argument to the LdaModel constructor:

https://radimrehurek.com/gensim/models/ldamodel.html

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Answer above - Hi, let me explain - I am not reproducing the LDA, simply the final lines of code pasted above (ie. AFTER creating a document-topic dictionary, simply running through it, without regenerating it) – Dror M Sep 29 '19 at 12:12