-1
def format_topics_sentences(corpus, texts, ldamodel=None):
    # Init output
    sent_topics_df = pd.DataFrame()

    # Get main topic in each document
    for i, row_list in enumerate(ldamodel[corpus]):
        row = row_list[0] if ldamodel.per_word_topics else row_list            
        # print(row)
        row = sorted(row, key=lambda x: (x[1]), reverse=True)
        # Get the Dominant topic, Perc Contribution and Keywords for each document
        for j, (topic_num, prop_topic) in enumerate(row):
            if j == 0:  # => dominant topic
                wp = ldamodel.show_topic(topic_num)
                topic_keywords = ", ".join([word for word, prop in wp])
                sent_topics_df = sent_topics_df.append(pd.Series([int(topic_num), round(prop_topic,4), topic_keywords]), ignore_index=True)
                #sent_topics_df = sent_topics_df.concat([int(topic_num), round(prop_topic,4), topic_keywords], ignore_index=True)
            else:
                break
    sent_topics_df.columns = ['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords']

    # Add original text to the end of the output
    contents = pd.Series(texts)
    sent_topics_df = pd.concat([sent_topics_df, contents], axis=1)
    return(sent_topics_df)`

I am trying to generted the domain topci and percent contribution but I am not able to do it. When I call the function df_topic_sents_keywords_I1 = format_topics_sentences(ldamodel=lda_model_I1, corpus=corpus_I1, texts=data_ready_I1)

I get the dollowing error

AttributeError                            Traceback (most recent call last)
<ipython-input-44-9e6e39c2cce9> in <cell line: 1>()
----> 1 df_topic_sents_keywords_I1 = format_topics_sentences(ldamodel=lda_model_I1, corpus=corpus_I1, texts=data_ready_I1)

AttributeError: 'DataFrame' object has no attribute 'append'

I have tried both concat and append but it's not working.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • Given `tips = sns.load_dataset('tips')` and `s = tips.smoker`, which are a DataFrame and Series, respectively, `pd.concat([tips, s], axis=1)` works without issue. I suggest restarting your environment, and running the code again. `pd.concat` shouldn't have that error. The issue is not reproducible with a complete [mre] with data. – Trenton McKinney May 30 '23 at 23:34

1 Answers1

0

DataFrame.append was actually removed in 2.0.0 (see here) and IIUC, you can use :

#I removed the comments to make the bloc shorter
def format_topics_sentences(corpus, texts, ldamodel=None):
    data = []

    for i, row_list in enumerate(ldamodel[corpus]):
        row = row_list[0] if ldamodel.per_word_topics else row_list            
        row = sorted(row, key=lambda x: x[1], reverse=True)
        
        for j, (topic_num, prop_topic) in enumerate(row):
            if j == 0:
                wp = ldamodel.show_topic(topic_num)
                topic_keywords = ", ".join([word for word, prop in wp])
                data.append([int(topic_num), round(prop_topic, 4), topic_keywords])
            else:
                break

    sent_topics_df = (
        pd.DataFrame(data, columns=[
            'Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords'])
                .assign(Text= texts) # or use Text= pd.Series(text)
    )
        
    return sent_topics_df
Timeless
  • 22,580
  • 4
  • 12
  • 30