My dataframe is as follows:
import pandas as pd
df=pd.DataFrame({'A':['a','a','b','c'], 'B':['x','x','x','x'],'C':['1','2','3','4'], 'D':[0,0,0,0]})
gb_a=df.groupby(['A'], as_index=False)
My desired output:
A unique_b unique_c
a 1 2
b 1 1
c 1 1
I have tried the following:
gb_a['B','C'].agg({'B':pd.Series.nunique, 'C':pd.Series.nunique})
gb_a['B','C'].agg({'unique_b':pd.Series.nunique, 'unique_c':pd.Series.nunique})
gb_a['B','C'].agg({'B': {'unique_b':pd.Series.nunique}, 'C': {'unique_c':pd.Series.nunique}})
Error I am getting:
KeyError: ('B', 'C')
Questions
Is it possible to fix it in above technique?
I know I can do it individually and then merging as follows:
out_df=gb_a['B'].agg({'unique_b':pd.Series.nunique})
out_df= pd.merge(out_df,gb_a['C'].agg({'unique_c':pd.Series.nunique}), on='A', how='inner')
But I am supposed to do different aggregation on selected columns (a lot of columns). Do not want to merge that many times on a large data.
I am using Python 2.7
Thanks.
p.s. I have read different answers on similar topic. e.g.