1

I have a dataframe (df) with 5 columns. I want to use 'group by' for the first 3 columns, and put in columns 4 the values accociated in a list and idem for columns 5. My code works for columns 4 :

df_new=df.groupby(['1','2', '3'])['4'].apply(list)

But I do not know how to manage for the columns 5.

df_new=df.groupby(['1','2', '3'])['4', '5'].apply(list)

doesn't work.

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
Rem Carbone
  • 19
  • 1
  • 4
  • please provide a small sample data set in text or CSV format and your desired data set. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. – MaxU - stand with Ukraine Dec 10 '17 at 12:39
  • 1
    Not sure, but is `df.groupby(['1', '2', '3']).agg({'4': list, '5': list})` the result you're after? – Jon Clements Dec 10 '17 at 12:44

2 Answers2

0

Demo:

Source DF:

In [174]: df = pd.DataFrame(np.random.randint(3, size=(20,5)), columns=list('12345'))

In [175]: df
Out[175]:
    1  2  3  4  5
0   2  1  2  0  0
1   2  0  2  2  0
2   0  2  2  2  2
3   0  2  2  1  2
4   0  2  1  2  1
5   1  1  2  1  2
6   0  2  1  0  1
7   2  2  0  1  1
8   0  0  2  2  1
9   1  0  2  0  0
10  2  0  1  0  1
11  0  1  2  1  2
12  2  0  1  0  1
13  2  0  0  2  0
14  1  1  1  1  0
15  2  2  2  0  0
16  0  1  1  2  2
17  2  1  1  0  0
18  1  0  0  0  1
19  2  2  2  1  2

Solution:

In [176]: (df.groupby(['1','2', '3'])['4','5']
             .apply(lambda x: pd.Series(x.values.T.tolist(), index=['4','5'])))
Out[176]:
            4       5
1 2 3
0 0 2     [2]     [1]
  1 1     [2]     [2]
    2     [1]     [2]
  2 1  [2, 0]  [1, 1]
    2  [2, 1]  [2, 2]
1 0 0     [0]     [1]
    2     [0]     [0]
  1 1     [1]     [0]
    2     [1]     [2]
2 0 0     [2]     [0]
    1  [0, 0]  [1, 1]
    2     [2]     [0]
  1 1     [0]     [0]
    2     [0]     [0]
  2 0     [1]     [1]
    2  [0, 1]  [0, 2]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

Could create an extra column first and then perform your action too.

import pandas as pd

df= pd.DataFrame(dict(A=[1,2,2],B=[1,2,2],C=[3,2,2],D=list("ABC"),E=list("DEF")))

df['list'] = df[['D','E']].values.tolist()
df = df.groupby(['A','B','C'])['list'].apply(list)

print(df.to_frame())

Returns:

                   list
A B C                  
1 1 3          [[A, D]]
2 2 2  [[B, E], [C, F]]
Anton vBR
  • 18,287
  • 5
  • 40
  • 46