I want to know that is the best topic number (k) to feed to gensim for LDA, I've found an answer on StackOverflow. However, I got an error mentioned below.
Here is the link to the suggested way to feed the number of the optimal topics that I've found.
What is the best way to obtain the optimal number of topics for a LDA-Model using Gensim?
# import modules
import seaborn as sns
import matplotlib.pyplot as plt
from gensim.models import LdaModel, CoherenceModel
from gensim import corpora
# make models with n k
dirichlet_dict = corpora.Dictionary(corpus)
bow_corpus = [dirichlet_dict.doc2bow(text) for text in corpus]
# Considering 1-15 topics, as the last is cut off
num_topics = list(range(16)[1:])
num_keywords = 15
LDA_models = {}
LDA_topics = {}
for i in num_topics:
LDA_models[i] = LdaModel(corpus=bow_corpus,
id2word=dirichlet_dict,
num_topics=i,
update_every=1,
chunksize=len(bow_corpus),
passes=20,
alpha='auto',
random_state=42)
shown_topics = LDA_models[i].show_topics(num_topics=num_topics,
num_words=num_keywords,
formatted=False)
LDA_topics[i] = [[word[0] for word in topic[1]] for topic in shown_topics]
When I try to implent the code i got this error:
-> 1145 if num_topics < 0 or num_topics >= self.num_topics:
1146 num_topics = self.num_topics
1147 chosen_topics = range(num_topics)
TypeError: '<' not supported between instances of 'list' and 'int'