I went into countless threads (1 2 3...) and still I don't find a solution to my problem... I have a dataframe like this:
prop1 prop2 prop3 prop4
L30 3 bob 11.2
L30 54 bob 10
L30 11 john 10
L30 10 bob 10
K20 12 travis 10
K20 1 travis 4
K20 66 leo 10
I would like to do a groupby on prop1, AND at the same time, get all the other columns aggregated, but only with unique values. Like that:
prop1 prop2 prop3 prop4
L30 3,54,11,10 bob,john 11.2,10
K20 12,1,66 travis,leo 10,4
I tried with different methods:
df.groupby('prop1')['prop2','prop3','prop4'].apply(np.unique)
returns
AttributeError: 'numpy.ndarray' object has no attribute 'index' PLUS TypeError: Series.name must be a hashable type
Also:
.apply(lambda x: pd.unique(x.values.ravel()).tolist())
which gives a list as output, and I would like columns.df.groupby('prop1')['prop2','prop3','prop4'].unique()
by itself doesn't work because there are multiple columns..apply(f)
with f being:def f(df): df['prop2']=df['prop2'].drop_duplicates() df['prop3']=df['prop3'].drop_duplicates() df['prop4']=df['prop4'].drop_duplicates() return df
doesn't do anything.
- I also tried to use
.agg()
with different options but didn't get success.
Does one of you would have any idea?
Thank you very much :)