I'm analyzing the main topics in a group of sentences on pycharm. This sentences are already classified in 8 different categories, but i wish to be more specific: i want to detect the topics for the sentences in each category.
I have a dataframe called data whose columns are 'tokenized_sentence' and 'category'. I loop through each category and use pyLDAvis (via a function named 'lda_vis') to display the results like this:
def lda_vis(text, nb_of_topics):
tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, max_features=1000)
tfidf = tfidf_vectorizer.fit_transform(text)
lda = LatentDirichletAllocation(n_components=nb_of_topics, max_iter=5,
learning_method='online',
learning_offset=10,
random_state=0)
lda.fit(tfidf)
visualisation = pyLDAvis.sklearn.prepare(lda, tfidf, tfidf_vectorizer)
return pyLDAvis.show(visualisation)
for category in data['category'].unique(): # list with all 8 categories
df_by_category = data.loc[data['category'] == category]
lda_vis(df_by_category['tokenized_sentence'], nb_of_topics=4)
The problem: i only get the result for the first loop. I get the display of lda_vis on a new tab for the first category, but i don't get the next ones. Once i stop the code manually i get some "Error: process 'number x' cannot be found" messages in the Run console.
How can i perform this analysis correctly ?
I have pyLDAvis version 2.1.2
Thanks a lot!