1

I'm analyzing the main topics in a group of sentences on pycharm. This sentences are already classified in 8 different categories, but i wish to be more specific: i want to detect the topics for the sentences in each category.

I have a dataframe called data whose columns are 'tokenized_sentence' and 'category'. I loop through each category and use pyLDAvis (via a function named 'lda_vis') to display the results like this:

def lda_vis(text, nb_of_topics):
    tfidf_vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, max_features=1000)
    tfidf = tfidf_vectorizer.fit_transform(text)
    lda = LatentDirichletAllocation(n_components=nb_of_topics, max_iter=5,
                                    learning_method='online',
                                    learning_offset=10,
                                    random_state=0)
    lda.fit(tfidf)
    visualisation = pyLDAvis.sklearn.prepare(lda, tfidf, tfidf_vectorizer)
    return pyLDAvis.show(visualisation)


for category in data['category'].unique():  # list with all 8 categories
    df_by_category = data.loc[data['category'] == category]
    lda_vis(df_by_category['tokenized_sentence'], nb_of_topics=4)

The problem: i only get the result for the first loop. I get the display of lda_vis on a new tab for the first category, but i don't get the next ones. Once i stop the code manually i get some "Error: process 'number x' cannot be found" messages in the Run console.

How can i perform this analysis correctly ?

I have pyLDAvis version 2.1.2

Thanks a lot!

Perrupi
  • 79
  • 7

0 Answers0