Collapse mutiple rows of a dataframe into one row - based on a unique key

Question

I have a dataframe as :

What I need is :-

1 |  A1, A11 
2 | A2, A22, A23
3 | A3, A33
4 | A4, A44, A444 
5 | A5

ie. each column can have different number of rows present.

Anyway I can collapse them elegantly, without using the reading from dict and then concat to the list as applicable. In the traditional sense I need to perform multiple joins on this - Any way around ?

Note that there should be only 2 final columns.

JAB · Accepted Answer · 2015-01-26T18:58:15.893

1

df =pd.DataFrame({'A':[1,1,1,2,2,3,3,3],    'B':['aaa','bbb','cc','gg','aaa','bbb','cc','gg']})

def f(x):
    return [x['B'].values]

df.groupby('A').apply(f)

Create a group by on the column you want to reduce over and then apply a function that returns the results of the group by an a list per group. Note this returns a series.

Update: change the series to a dataframe.

series =df.groupby('A').apply(f)
series.name = 'metric'
series.reset_index()

edited Jan 26 '15 at 18:58

answered Jan 24 '15 at 09:28

JAB

12,401
6
45
50

This is delightful and elegant - I was initially treating these as dict -and bumping into memory issues during collecting. Some of the best tricks are the simplest - and thank you for teaching me that. – ekta Jan 24 '15 at 09:43
On doing pd.DataFrame over the series, I still get a one column df - how do I get 2 columns back ? – ekta Jan 26 '15 at 02:47
Added how you can change the series to a frame above. Naming the series first avoids having to rename the data frame column – JAB Jan 26 '15 at 19:00

Collapse mutiple rows of a dataframe into one row - based on a unique key

1 Answers1

Linked