Context
I have a dataframe that contains transcripts. Each row in the df has a unique ID, transcript line, and timestamp, and each ID can have multiple correspondences within the day (or span of days).
Example Code Below!
What I have:
#What I have starting out with. Df is ordered by CustomerID and Timestamp
pd.DataFrame({'AgentID': 0, 'CustomerID': 1, 'Date': ['2018-01-21', '2018-01-21', '2018-01-22', '2018-01-22'], 'Timestamp': ['2018-01-21 16:28:54', '2018-01-21 16:48:54', '2018-01-22 12:18:54', '2018-01-22 12:22:54'], 'Transcript_Line':['How can I help you?', 'I need help with this pandas problem...', 'Did you get that problem resolved?', 'Nope I still suck at pandas']})
What I need:
#This is the final result
pd.DataFrame({'AgentID': 0, 'CustomerID': 1, 'Date': ['2018-01-21', '2018-01-22'], 'Transcript_Line': ['How can I help you?\nI need help with this pandas problem...', 'Did you get that problem resolved?\nNope I still suck at pandas']})
I need to organize and combine all transcripts (strings in each row) that correspond to the same day (in order).
This is what I have tried so far The issue is here:
def concatConvos(x):
if len(set(x.Date)) == 1:
return pd.Series({'Email' : x['CustomerID'].values[0],
'Date': x['Date'].values[0],
'Conversation' : '\n'.join(x['Transcript_Line'])})
else:
rows = []
for date in set(x.Date):
rows.append(pd.Series({'Email': x['CustomerID'].values[0],
'Date': date,
'Conversation': '\n'.join(x[x.Date == date].Transcript_Line)}))
return tuple(rows)
data3 = data2.groupby('CustomerID').apply(concatConvos)
I am able to get this to work for cases where the customer only has 1 date of correspondence (meaning he did not reach out multiple times, the first case).
If I try to handle more cases than 1 then I end up with attribute errors likely because the function is returning multiple series objects.
Is there an easier way to go about this?