I can't seem to find it or probably my knowledge on statistics and its terms are the problem here but I want to achieve something similar to the graph found on the bottom page of the LDA lib from PyPI and observe the uniformity/convergence of the lines. How can I achieve this with Gensim LDA?
Asked
Active
Viewed 7,236 times
1 Answers
16
You are right to wish to plot the convergence of your model fitting. Gensim unfortunately does not seem to make this very straight forward.
Run the model in such a way that you will be able to analyze the output of the model fitting function. I like to setup a log file.
import logging logging.basicConfig(filename='gensim.log', format="%(asctime)s:%(levelname)s:%(message)s", level=logging.INFO)
Set the
eval_every
parameter inLdaModel
. The lower this value is the better resolution your plot will have. However, computing the perplexity can slow down your fit a lot!lda_model = LdaModel(corpus=corpus, id2word=id2word, num_topics=30, eval_every=10, pass=40, iterations=5000)
Parse the log file and make your plot.
import re import matplotlib.pyplot as plt p = re.compile("(-*\d+\.\d+) per-word .* (\d+\.\d+) perplexity") matches = [p.findall(l) for l in open('gensim.log')] matches = [m for m in matches if len(m) > 0] tuples = [t[0] for t in matches] perplexity = [float(t[1]) for t in tuples] liklihood = [float(t[0]) for t in tuples] iter = list(range(0,len(tuples)*10,10)) plt.plot(iter,liklihood,c="black") plt.ylabel("log liklihood") plt.xlabel("iteration") plt.title("Topic Model Convergence") plt.grid() plt.savefig("convergence_liklihood.pdf") plt.close()

groceryheist
- 1,538
- 17
- 24
-
1Does the plot help to determine the number of passes? What's the difference between passes and iterations? Thanks! – Victor Wang Jul 01 '19 at 04:52
-
1@VictorWang maybe this helps: "passes controls how often we train the model on the entire corpus. Another word for passes might be “epochs”. iterations is somewhat technical, but essentially it controls how often we repeat a particular loop over each document. It is important to set the number of “passes” and “iterations” high enough" ref: https://radimrehurek.com/gensim/auto_examples/tutorials/run_lda.html – Ferran Jun 18 '20 at 10:32