How to aggregate categorical data where the unique id has multiple rows with different values for another column?

Question

I need to aggregate data with the same id's into one row, but save the different Code's into different columns.

I look into this post: How to pivot a dataframe, but this post does not answer my question.

For example: the input dataframe can be:

d = {'id': ['a', 'a', 'a', 'b', 'b', 'b'], 'Code': ['A1', 'A2', 'A3', 'B1', 'B2', 'B3']}
df1 = pd.DataFrame(data=d)
print(df1)

   id  Code
0   a    A1
1   a    A2
2   a    A3
3   b    B1
4   b    B2
5   b    B3

But the output needs to be like this:

d = {'id': ['a', 'b'], 'Code_1': ['A1', 'B1'], 'Code_2':['A2', 'B2'], 'Code_3':['A3', 'B3']}
df2 = pd.DataFrame(data=d)
print(df2)

   id  Code_1  Code_2  Code_3
0   a    A1      A2      A3
1   b    B1      B2      B3

I tried multiple ways. Closest was code i found on a different question that did at least the aggregation part

df1.groupby('id')['Code'].apply(' '.join).reset_index()

But that gives an output where the different codes are saved into one column as one long string, not saved into multiple columns.

  id      Code
0  a  A1 A2 A3
1  b  B1 B2 B3

Any help would be great!

Shubham Sharma · Accepted Answer · 2020-05-13T13:53:02.250

You can use Series.str.split to split the Code column of the grouped dataframe on the seperator and pass a optional parameter expand=True which expands the splitted strings into multiple columns this creates a new dataframe df_s. Then using add_prefix you can add the prefix Code_ to the columns of dataframe df_s. Finally using pd.concat you can concat the df_g["id"] and df_s together.

Use:

df_g = df1.groupby('id')['Code'].apply(' '.join).reset_index()

df_s = df_g["Code"].str.split(' ', expand=True)
df_s.columns = df_s.columns + 1
df_s =  df_s.add_prefix("Code_")

result = pd.concat([df_g["id"], df_s], axis=1) 
print(result)

This prints:

 id Code_1 Code_2 Code_3
0  a     A1     A2     A3
1  b     B1     B2     B3

How to aggregate categorical data where the unique id has multiple rows with different values for another column?

1 Answers1