0

I have a dataframe as :

1 A1
1 A11
2 A2
2 A22
2 A23
3 A3
3 A33
4 A4
4 A44
4 A444
5 A5

What I need is :-

1 |  A1, A11 
2 | A2, A22, A23
3 | A3, A33
4 | A4, A44, A444 
5 | A5 

ie. each column can have different number of rows present.

Anyway I can collapse them elegantly, without using the reading from dict and then concat to the list as applicable. In the traditional sense I need to perform multiple joins on this - Any way around ?

Note that there should be only 2 final columns.

ekta
  • 1,560
  • 3
  • 28
  • 57

1 Answers1

1
df =pd.DataFrame({'A':[1,1,1,2,2,3,3,3],    'B':['aaa','bbb','cc','gg','aaa','bbb','cc','gg']})

def f(x):
    return [x['B'].values]

df.groupby('A').apply(f)

Create a group by on the column you want to reduce over and then apply a function that returns the results of the group by an a list per group. Note this returns a series.

Update: change the series to a dataframe.

series =df.groupby('A').apply(f)
series.name = 'metric'
series.reset_index()
JAB
  • 12,401
  • 6
  • 45
  • 50
  • This is delightful and elegant - I was initially treating these as dict -and bumping into memory issues during collecting. Some of the best tricks are the simplest - and thank you for teaching me that. – ekta Jan 24 '15 at 09:43
  • On doing pd.DataFrame over the series, I still get a one column df - how do I get 2 columns back ? – ekta Jan 26 '15 at 02:47
  • Added how you can change the series to a frame above. Naming the series first avoids having to rename the data frame column – JAB Jan 26 '15 at 19:00