How to get unique values from multiple columns in a pandas groupby

Question

Starting from this dataframe df:

df = pd.DataFrame({'c':[1,1,1,2,2,2],'l1':['a','a','b','c','c','b'],'l2':['b','d','d','f','e','f']})

   c l1 l2
0  1  a  b
1  1  a  d
2  1  b  d
3  2  c  f
4  2  c  e
5  2  b  f

I would like to perform a groupby over the c column to get unique values of the l1 and l2 columns. For one columns I can do:

g = df.groupby('c')['l1'].unique()

that correctly returns:

c
1    [a, b]
2    [c, b]
Name: l1, dtype: object

but using:

g = df.groupby('c')['l1','l2'].unique()

returns:

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

I know I can get the unique values for the two columns with (among others):

In [12]: np.unique(df[['l1','l2']])
Out[12]: array(['a', 'b', 'c', 'd', 'e', 'f'], dtype=object)

Is there a way to apply this method to the groupby in order to get something like:

c
1    [a, b, d]
2    [c, b, e, f]
Name: l1, dtype: object

is there a way you can have the output as distinct columns instead of one cell having a list? — aayush_malik, Oct 09 '20 at 04:45

ayhan · Accepted Answer · 2019-09-15T21:56:48.400

59

You can do it with apply:

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

edited Sep 15 '19 at 21:56

answered Mar 19 '16 at 20:07

ayhan

70,170
20
182
203

score 58 · Answer 2 · answered Jan 23 '20 at 22:30

58

Alternatively, you can use agg:

g = df.groupby('c')['l1','l2'].agg(['unique'])

answered Jan 23 '20 at 22:30

Yaakov Bressler

9,056
2
45
69

1

how would you combine 'unique' and let's say '.join' in the same agg? – CodeMaster Feb 19 '21 at 18:43
1

You can write a custom function and apply it the same way. For example: `f = lambda arr: ','.join(np.unique(arr))` --> then `.agg([f])` or, if you want to label it: `.agg([('MyName', f)])` – Yaakov Bressler Feb 19 '21 at 19:24

score 14 · Answer 3 · answered Feb 27 '21 at 16:25

14

One more alternative is to use GroupBy.agg with set

df.groupby('c').agg(set)

       l1      l2
c                
1  {a, b}  {d, b}
2  {c, b}  {e, f}

answered Feb 27 '21 at 16:25

Ch3steR

20,090
4
28
58

3

You might get into trouble with this when the values in l1 and l2 aren't hashable (ex timestamps). Otherwise, solid solution. – Yaakov Bressler May 05 '21 at 14:54

How to get unique values from multiple columns in a pandas groupby

3 Answers3

Linked

Related