0

I have an LDA model where I've got the per-document topic probability matrix as per below.

doc_lda = lda_model[corpus]

How do I extract the topic ID with the largest probability for each document? I'm having difficulty beyond converting doc_lda into a list or dataframe.

Coverting it to a list, it looks like a list of a list of a tuple?

enter image description here

1 Answers1

0

Based on a few people's code here and here:

all_topics = lda_model.get_document_topics(corpus, minimum_probability=0.0)
all_topics_csr = gensim.matutils.corpus2csc(all_topics)
all_topics_numpy = all_topics_csr.T.toarray()
all_topics_df = pd.DataFrame(all_topics_numpy)
all_topics_df.idxmax(axis=1)