0

I've used gensim for text summarizing in Python. I want my summarized output to be stored in a different column in the same dataframe.

I've used this code:

for n, row in df_data_1.iterrows():
        text=df_data_1['Event Description (SAP)']
        print(text)
        *df_data_1['Summary']=summarize(text)*
print(df_data_1['Summary'])

The error is coming on line 4 of this code, which states: TypeError: expected string or bytes-like object.

How to store the processed text in the pandas dataframe

ᴀʀᴍᴀɴ
  • 4,443
  • 8
  • 37
  • 57
Data_miner
  • 19
  • 4
  • Possible duplicate of [How to iterate over pandas dataframe and create new column](https://stackoverflow.com/questions/39873995/how-to-iterate-over-pandas-dataframe-and-create-new-column) – Tom Rijntjes Jun 27 '18 at 12:36
  • does the 'Summary' already exist before this piece of code? If not, take a look at this related issue https://stackoverflow.com/questions/39873995/how-to-iterate-over-pandas-dataframe-and-create-new-column – Tom Rijntjes Jun 27 '18 at 12:38
  • Even if this did not error I think the end result would be that the output of the summarise function from the last loop will be present in every row in the summary column because of the way in which pandas lets you edit all rows simultaneously. In line 4 you are simply saying the value of every row in the summarize column should be the same single string output of the function. See the answer below that uses .apply to apply this function row-wise instead. – James Allen-Robertson Jun 27 '18 at 14:50

1 Answers1

0

If it's not string or bytes-like, what is it? You could check the type of your summarize function and move forward from there.

test_text = df_data_1['Event Description (SAP)'].iloc[0]
print(type(summarize(test_text))

Another remark: typically you'd want to avoid looping over a dataframe (see discussion). If you want to apply a function to an entire column, use df.apply() as follows:

df_data1[‘Summary’] = df_data1['Event Description (SAP)'].apply(lambda x: summarize(x))
Tom Rijntjes
  • 614
  • 4
  • 16