Summarize rows in pandas dataframe by column value and append specific column values as columns

Question

I have a dataframe as follows with multiple rows per id (maximum 3).

dat = pd.DataFrame({'id':[1,1,1,2,2,3,4,4], 'code': ["A","B","D","B","D","A","A","D"], 'amount':[11,2,5,22,5,32,11,5]})

    id      code    amount
0   1       A       11
1   1       B       2
2   1       D       5
3   2       B       22
4   2       D       5
5   3       A       32
6   4       A       11
7   4       D       5

I want to consolidate the df and have only one row per id so that it looks as follows:

    id  code1   amount1    code2    amount2    code3    amount3    
0   1       A       11     B        2          D        5
1   2       B       22     D        5          NaN      NaN
2   3       A       32     NaN      NaN        NaN      NaN
3   4       A       11     D        5          NaN      NaN

How can I acheive this in pandas?

score 3 · Accepted Answer · answered Jan 03 '23 at 09:48

Use GroupBy.cumcount for counter with reshape by DataFrame.unstack and DataFrame.sort_index, last flatten MultiIndex and convert id to column by DataFrame.reset_index:

df = (dat.set_index(['id',dat.groupby('id').cumcount().add(1)])
         .unstack()
         .sort_index(axis=1, level=1, sort_remaining=False))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
df = df.reset_index()
print (df)
   id code1  amount1 code2  amount2 code3  amount3
0   1     A     11.0     B      2.0     D      5.0
1   2     B     22.0     D      5.0   NaN      NaN
2   3     A     32.0   NaN      NaN   NaN      NaN
3   4     A     11.0     D      5.0   NaN      NaN

Summarize rows in pandas dataframe by column value and append specific column values as columns

1 Answers1