1

I applied LDA from gensim package on the corpus and I get the probability with each term. My problem is how I get only the terms without their probability. Here is my code:

K = ldamodel.num_topics
t = 0
topicWordProbMat = ldamodel.print_topics(K)
for  topic_dist in topicWordProbMat:
    print('Topic #',t,topic_dist)
    t = t + 1

The output as example is like this:

Topic # 0 '0.181*things + 0.181*amazon + 0.181*good
Topic # 1 '0.031*nokia + 0.031*microsoft + 0.031*apple  

and I want it as this:

Topic # 0 things amazon good
Topic # 1 nokia microsoft apple

any idea how? Thanks in advance

Yousra Gad
  • 363
  • 3
  • 15

1 Answers1

2

Gensim has a show_topic method built in which will display n most probable words for each topic. The following will return the top 10 words in each topic as a dict with key value pairs.

topn_words = {'Topic_' + str(i): [word for word, prob in lda.show_topic(i, topn=10)] for i in range(0, lda.num_topics)}

Similar question here: How to generate word clouds from LDA models in Python?

Kenneth Orton
  • 399
  • 1
  • 11