8

I'm trying to combine multiple rows of a dataframe into one row, with the columns with different values being combined in a list. There are multiple columns with different values.

The df.groupby('a')['b'].apply(list) works well if only 1 column ('b' in this instance) has to be made to a list, but I can't figure out how to do it for multiple columns.

Dataframe:

   a  b  c       d
0  1  b  1   first
1  1  b  2  second
2  2  c  1   third
3  2  c  2  fourth
4  2  c  3   fifth

Prefered dataframe post operation:

   a  b          c                       d
0  1  b     [1, 2]         [first, second]
1  2  c  [1, 2, 3]  [third, fourth, fifth]

Is there an easy way to do this?

Georgy
  • 12,464
  • 7
  • 65
  • 73
MvR
  • 111
  • 1
  • 1
  • 6

1 Answers1

11
df = df.groupby(['a','b']).apply(lambda x: [list(x['c']), list(x['d'])]).apply(pd.Series)
df.columns =['a','b','c','d']

Output

   a  b          c                       d
0  1  b     [1, 2]         [first, second]
1  2  c  [1, 2, 3]  [third, fourth, fifth]
iamklaus
  • 3,720
  • 2
  • 12
  • 21