Aggregate unique values from multiple columns with pandas GroupBy

Question

I went into countless threads (1 2 3...) and still I don't find a solution to my problem... I have a dataframe like this:

prop1 prop2 prop3    prop4 
L30   3     bob      11.2
L30   54    bob      10
L30   11    john     10
L30   10    bob      10
K20   12    travis   10 
K20   1     travis   4 
K20   66    leo      10

I would like to do a groupby on prop1, AND at the same time, get all the other columns aggregated, but only with unique values. Like that:

prop1  prop2       prop3       prop4
L30    3,54,11,10  bob,john    11.2,10
K20    12,1,66     travis,leo  10,4

I tried with different methods:

df.groupby('prop1')['prop2','prop3','prop4'].apply(np.unique) returns

AttributeError: 'numpy.ndarray' object has no attribute 'index' PLUS TypeError: Series.name must be a hashable type

Also: .apply(lambda x: pd.unique(x.values.ravel()).tolist()) which gives a list as output, and I would like columns.
df.groupby('prop1')['prop2','prop3','prop4'].unique() by itself doesn't work because there are multiple columns.
.apply(f) with f being:

def f(df): df['prop2']=df['prop2'].drop_duplicates() df['prop3']=df['prop3'].drop_duplicates() df['prop4']=df['prop4'].drop_duplicates() return df

doesn't do anything.

I also tried to use .agg() with different options but didn't get success.

Does one of you would have any idea?

Thank you very much :)

(I had a small intro with 'Hello guys, I'm using pandas' etc... But it was deleted by a wrong copy and paste when reformatting, sorry... :) ) — Nithrynx, Jan 10 '19 at 17:09

cs95 · Accepted Answer · 2019-01-10T18:04:17.247

Use groupby and agg, and aggregate only unique values by calling Series.unique:

df.astype(str).groupby('prop1').agg(lambda x: ','.join(x.unique()))

            prop2       prop3      prop4
prop1                                   
K20       12,1,66  travis,leo   10.0,4.0
L30    3,54,11,10    bob,john  11.2,10.0

df.astype(str).groupby('prop1', sort=False).agg(lambda x: ','.join(x.unique()))

            prop2       prop3      prop4
prop1                                   
L30    3,54,11,10    bob,john  11.2,10.0
K20       12,1,66  travis,leo   10.0,4.0

If handling NaNs is important, call fillna in advance:

import re
df.fillna('').astype(str).groupby('prop1').agg(
    lambda x: re.sub(',+', ',', ','.join(x.unique()))
)

            prop2       prop3      prop4
prop1                                   
K20       12,1,66  travis,leo   10.0,4.0
L30    3,54,11,10    bob,john  11.2,10.0

I already found the Series.unique but I was still mentionning all the columns, which was blocking it to work I assume. — Nithrynx, Jan 11 '19 at 09:22

score 1 · Answer 2 · answered Jan 10 '19 at 17:24

`melt` + `pivot_table`

s = df.astype(str).melt(id_vars='prop1').drop_duplicates()

s.pivot_table(
  index='prop1',
  columns='variable',
  values='value',
  aggfunc=','.join)

variable       prop2       prop3      prop4
prop1
K20          12,1,66  travis,leo   10.0,4.0
L30       3,54,11,10    bob,john  11.2,10.0

Aditya Bhattacharya · Answer 3 · 2019-12-05T03:02:11.273

1

Try this, it worked for me perfectly:

df.groupby(['prop1','prop2', 'prop4']).agg(lambda x: ','.join(x.unique())).reset_index()

This will give the result as:

prop1  prop2       prop3       prop4
L30    3,54,11,10  bob,john    11.2,10
K20    12,1,66     travis,leo  10,4

edited Dec 05 '19 at 03:02

answered Dec 04 '19 at 12:07

Aditya Bhattacharya

914
2
9
22

Aggregate unique values from multiple columns with pandas GroupBy

3 Answers3

`melt` + `pivot_table`

Linked

Related

Aggregate unique values from multiple columns with pandas GroupBy

3 Answers3

melt + pivot_table

Linked

Related

`melt` + `pivot_table`