2

For reference, I already looked at the following questions:

  1. Gensim LDA for text classification
  2. Python Gensim LDA Model show_topics funciton

I am looking to have my LDA model trained from Gensim classify a sentence under one of the topics that the model creates. Something long the lines of

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
for line in document: # where each line in the document is its own sentence for simplicity
    print('Sentence: ', line)
    topic = lda.parse(line) # where the classification would occur
    print('Topic: ', topic)

I know gensim does not have a parse function, but how would one go about accomplishing this? Here is the documentation that I've been following but I haven't gotten anywhere with it:

https://radimrehurek.com/gensim/auto_examples/core/run_topics_and_transformations.html#sphx-glr-auto-examples-core-run-topics-and-transformations-py

Thanks in advance.

edit: More documentation- https://radimrehurek.com/gensim/models/ldamodel.html

sophros
  • 14,672
  • 11
  • 46
  • 75
Q.H.
  • 1,406
  • 2
  • 17
  • 33

1 Answers1

1

Let me get your problem right: You want to train a LDA Model on some documents an retrieve 7 topics. Then you want to classify new documents in one (or more?) of these topics, meaning you want to infer topic distributions on new, unseen documents.

If so, the gensim documentation provides answers.

lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=7, passes=20)
lda.print_topics()
count = 1
for line in document: # where each line in the document is its own sentence for simplicity
    print('\nSentence: ', line)
    line = line.split()
    line_bow = id2word.doc2bow(line)
    doc_lda = lda[line_bow]
    print('\nLine ' + str(count) + ' assigned to Topic ' + str(max(doc_lda)[0]) + ' with ' + str(round(max(doc_lda)[1]*100,2)) + ' probability!')
    count += 1
Nils_Denter
  • 488
  • 1
  • 6
  • 18
  • `max(doc_lda)` will just give the topic with highest index, because `doc_lda` returns a tuple, with first value being index. This code will assign the same topic to every document – Hamza Zubair Jun 02 '22 at 07:24
  • Need to add `, key=lambda k: k[1]` to the max function. `str(max(doc_lda, key=lambda k: k[1])[0])`. – Brooks B Apr 13 '23 at 01:38