TL;DR: coherence is not "stable" -i.e. reproducible between runs - in this case because of fundamental LDA properties. You can make LDA reproducible by setting random seeds and PYTHONHASHSEED=0
. You can take other steps to improve your results.
Long Version:
This is not a bug, it's a feature.
It is less a question of trust in the library, but an understanding of the methods involved. The scikit-learn
library also has an LDA implementation, and theirs will also give you different results on each run. But by its very nature, LDA is a generative probabilistic method. Simplifying a little bit here, each time you use it, many Dirichlet distributions are generated, followed by inference steps. These steps and distribution generation depend on random number generators. Random number generators, by their definition, generate random stuff, so each model is slightly different. So calculating the coherence of these models will give you different results every time.
But that doesn't mean the library is worthless. It is a very powerful library that is used by many companies (Amazon and Cisco, for example) and academics (NIH, countless researchers) - to quote from gensim's About page:
By now, Gensim is—to my knowledge—the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text.
If that is what you want, gensim is the way to go - certainly not the only way to go (tmtoolkit or sklearn also have LDA) but a pretty good choice of paths. That being said, there are ways to ensure reproducability between model runs.
Gensim Reproducability
Set PYTHONHASHSEED=0
From the Python documentation: "On Python 3.3 and greater, hash randomization is turned on by default."
Use random_state
in your model specification
Afaik, all of the gensim methods have a way of specifying the random seed to be used. Choose any number you like, but the default value of zero ("off") and use the same number for each rerun - this ensures that the same input into the random number generators always results in the same output (gensim ldamodel documentation).
Use ldamodel.save() and ldamodel.load() for model persistency
This is also a very useful, timesaving step that keeps you from having to re-run your models every time you start (very important for long-running models).
Optimize your models and data
This doesn't technically make your models perfectly reproducable, but even without the random seed settings, you will see your model perform better (at the cost of computation time) if you increase iterations
or passes
. Preprocessing also makes a big difference and is an art unto itself - do you choose to lemmatize or stem and why do you do so? This all can have important effects on the outputs and your interpretations.
Caveat: you must use one core only
Multicore methods (LdaMulticore
and the distributed versions) are never 100% reproducible, because of the way the operating system handles multiprocessing.