2

I have a pandas dataframe with multiple rows that can share an ID. Each row also has a value for the "label" column. What I would like is to combine all the labels that share the same ID.

For example, say this is what I have:

id | label 
-----------
 1    a
 1    b
 2    a
 2    c
 2    d
 3    e

What I would like is something like this:

id | label_list
----------------
1      [a,b]
2      [a,c,d]
3      [e]

So the labels that shared the same ID were combined and made into a list. What would be the most efficient way to do this?

Imu
  • 545
  • 5
  • 15
  • 1
    Possible duplicate of [grouping rows in list in pandas groupby](https://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby) – cmaher Aug 30 '17 at 18:25

2 Answers2

1

You need

df.groupby('id').label.apply(list).reset_index()

id       label 
1       [a, b]
2    [a, c, d]
3          [e]
Vaishali
  • 37,545
  • 5
  • 58
  • 86
0

This solution is very similar to @Vaishali's solution, but it uses .agg() instead of .apply() method:

In [110]: df.groupby('id', as_index=False)['label'].agg(lambda x: x.tolist())
Out[110]:
   id      label
0   1     [a, b]
1   2  [a, c, d]
2   3        [e]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419