Having a large DataFrame
of text, I want to first train and LDA model on it. So I do:
doc_clean = df['tweet_tokenized'].tolist()
dictionary = corpora.Dictionary(doc_clean)
doc_term_matrix = [dictionary.doc2bow(doc) for doc in doc_clean]
lda = LdaMulticore(doc_term_matrix, id2word=dictionary, num_topics=50)
Now that I have my trained lda
, I want to iterate throw df
row by row and put the probability of each row belonging to a given topic to its corresponding column. So, first I create 50 columns of zeros:
for i in range(50):
col_name = 'tweet_topic_'+str(i)
df[col_name] = 0
Then I iterate through the rows using iterrows()
and update the values using the at
method:
for row_index, row in df.iterrows():
new_doc = dictionary.doc2bow(row['tweet_tokenized'])
lda_result = lda[new_doc]
for topic in lda_result:
col_name = 'tweet_topic_'+(str(topic[0]))
df.at[row_index,col_name] = topic[1]
But it doesn't work properly and the values of the above 50 columns doesn't change and remain zeros.
Any idea how should I resolve this?
UPDATE:
I added row = row.copy()
and replaced at
with loc
and it works well now.
So here is the working code:
for row_index, row in df.iterrows():
row = row.copy()
new_doc = dictionary.doc2bow(row['tweet_tokenized'])
lda_result = lda[new_doc]
for topic in lda_result:
col_name = 'tweet_topic_'+(str(topic[0]))
df.loc[row_index,col_name] = topic[1]