Pandas .at not working and the dataframe doesn't change

Question

Having a large DataFrame of text, I want to first train and LDA model on it. So I do:

doc_clean = df['tweet_tokenized'].tolist()
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
lda = LdaMulticore(doc_term_matrix, id2word=dictionary, num_topics=50)

Now that I have my trained lda, I want to iterate throw df row by row and put the probability of each row belonging to a given topic to its corresponding column. So, first I create 50 columns of zeros:

for i in range(50):
    col_name = 'tweet_topic_'+str(i)
    df[col_name] = 0

Then I iterate through the rows using iterrows() and update the values using the at method:

for row_index, row in df.iterrows():
    new_doc = dictionary.doc2bow(row['tweet_tokenized'])
    lda_result = lda[new_doc]
    for topic in lda_result:
        col_name = 'tweet_topic_'+(str(topic[0]))
        df.at[row_index,col_name] = topic[1]

But it doesn't work properly and the values of the above 50 columns doesn't change and remain zeros.

Any idea how should I resolve this?

UPDATE: I added row = row.copy() and replaced at with loc and it works well now.

So here is the working code:

for row_index, row in df.iterrows():
    row = row.copy()
    new_doc = dictionary.doc2bow(row['tweet_tokenized'])
    lda_result = lda[new_doc]
    for topic in lda_result:
        col_name = 'tweet_topic_'+(str(topic[0]))
        df.loc[row_index,col_name] = topic[1]

Can you clarify what you mean by "it doesn't work properly?" — Evan, Dec 03 '18 at 20:22
What do the values for `'tweet_topic_'+(str(topic[0]))` look like if you print them out? — Evan, Dec 03 '18 at 20:29
@Evan by not working properly I mean it doesn't get updated. All values remain zeros, as initially set to. — msmazh, Dec 03 '18 at 20:30
@Evan I did the print('tweet_topic_'+str(topic[0])) and it works well. It'll give: tweet_topic_1, tweet_topic_2, tweet_topic_3, etc. — msmazh, Dec 03 '18 at 20:32
Can you post or link to some sample data? Are there 50 topics in each `lda_result`? — Evan, Dec 03 '18 at 21:16
@Evan lda_result will be a list of few tuples (mostly one tuple). For example, it'll be: [(1, 0.45), (4, 0.37)], meaning the text in this specific row belongs to topic 1 with 0.45 probability and to topic 3 with 0.37. — msmazh, Dec 03 '18 at 21:21
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184649/discussion-between-evan-and-msmazh). — Evan, Dec 03 '18 at 21:35
@Evan I resolved the issue. Please see the update in the post. Many thanks. — msmazh, Dec 03 '18 at 21:50

score 2 · Answer 1 · answered Dec 03 '18 at 21:51

2

Using instructions in the following post, I was able to resolve it:

Updating value in iterrow for pandas

for row_index, row in df.iterrows():
    row = row.copy()
    new_doc = dictionary.doc2bow(row['tweet_tokenized'])
    lda_result = lda[new_doc]
    for topic in lda_result:
        col_name = 'tweet_topic_'+(str(topic[0]))
        df.loc[row_index,col_name] = topic[1]

answered Dec 03 '18 at 21:51

msmazh

785
1
9
19

Perhaps you could accept this answer as the correct answer, as it has worked for you? – Antimony Nov 27 '20 at 17:28

Pandas .at not working and the dataframe doesn't change

1 Answers1