I'm using Python3 with pandas version '0.19.2'.
I have a pandas df as follows:
chat_id line
1 'Hi.'
1 'Hi, how are you?.'
1 'I'm well, thanks.'
2 'Is it going to rain?.'
2 'No, I don't think so.'
I want to group by 'chat_id', then do something like a rolling sum on 'line' to get the following:
chat_id line conversation
1 'Hi.' 'Hi.'
1 'Hi, how are you?.' 'Hi. Hi, how are you?.'
1 'I'm well, thanks.' 'Hi. Hi, how are you?. I'm well, thanks.'
2 'Is it going to rain?.' 'Is it going to rain?.'
2 'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'
I believe df.groupby('chat_id')['line'].cumsum() would only work on a numeric column.
I have also tried df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) to get a list of all the lines in the full conversation, but then I can't figure out how to unpack that list to create the 'rolling sum' style conversation column.