I use the following code to do the topic modeling on my documents:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
tfidf_vectorizer = TfidfVectorizer(tokenizer=tokenize, max_df=0.85, min_df=3, ngram_range=(1,5))
tfidf = tfidf_vectorizer.fit_transform(docs)
tfidf_feature_names = tfidf_vectorizer.get_feature_names()
from sklearn.decomposition import NMF
no_topics = 50
%time nmf = NMF(n_components=no_topics, random_state=11, init='nndsvd').fit(tfidf)
topic_pr= nmf.transform(tfidf)
I thought topic_pr gives me the probability distribution of different topics for each document. In other words, I expected that the numbers in the output(topic_pr) would be probabilities that the document in row X belongs to each of the 50 topics in model. But, the numbers do not add to 1. Are these really probabilities? If no, is there a way to convert them to probabilities?
Thanks