1

This is how my pandas data frame looks like.My requirement is to combine utterances column based on User_type and sorted by Chat_sequence_number and group them by case_id and Interaction_id

       Case_ID    Interaction_ID  Chat_Sequence_Number User_Type        Utterances
          1          123                   3           Person1            are
          1          123                   4           Person1              you
          1          123                   1           Person1              Hello,
          1          123                   2           Person1              how
          1          123                   5           Person1              feeling?
          1          123                   6           Person2              I'm
          1          123                   6           Person2              fine.

Is there a way that i can create a new data frame based on the requirements above. My final Output should look like this

Case_ID Interaction_ID User_Type utterance 1 123 Person1 Hello,how are you feeling? 1 123 Person1 I'm fine.

baskarmac
  • 35
  • 4
  • Does this answer your question? [Concatenate strings from several rows using Pandas groupby](https://stackoverflow.com/questions/27298178/concatenate-strings-from-several-rows-using-pandas-groupby) – G. Anderson Mar 04 '20 at 23:29
  • Please try to improve the format of the input and output, and explain what you've tried or found. – AMC Mar 05 '20 at 00:10

1 Answers1

0

You can do this in a few steps:

  1. Sort by Chat_Sequence_Number
  2. groupby Case_ID, Interaction_ID, and User_Type
  3. Use .apply() to concatenate the strings

This work is done in one line below

import pandas as pd

# Create the dataframe
df = pd.DataFrame(columns=['Case_ID','Interaction_ID','Chat_Sequence_Number','User_Type','Utterances'])
df['Utterances'] = 'are','you','Hello','how','feeling?',"I'm",'fine.'
df['User_Type'] = ['Person1']*5+['Person2']*2
df['Chat_Sequence_Number'] = 3,4,1,2,5,6,7
df['Case_ID'] = 1
df['Interaction_ID'] = 123

# Do the grouping
output = df.sort_values(['Chat_Sequence_Number']).groupby(['Case_ID','Interaction_ID','User_Type'])['Utterances'].apply(' '.join).reset_index()
print(output)

Output:

   Case_ID  Interaction_ID User_Type                  Utterances
0        1             123   Person1  Hello how are you feeling?
1        1             123   Person2                    I'm fine.
maurera
  • 1,519
  • 1
  • 15
  • 28