0

My dataframe is as follows:

import pandas as pd

df=pd.DataFrame({'A':['a','a','b','c'], 'B':['x','x','x','x'],'C':['1','2','3','4'], 'D':[0,0,0,0]})
gb_a=df.groupby(['A'], as_index=False)

My desired output:

A    unique_b    unique_c
a       1          2
b       1          1
c       1          1

I have tried the following:

gb_a['B','C'].agg({'B':pd.Series.nunique, 'C':pd.Series.nunique})
gb_a['B','C'].agg({'unique_b':pd.Series.nunique, 'unique_c':pd.Series.nunique})
gb_a['B','C'].agg({'B': {'unique_b':pd.Series.nunique}, 'C': {'unique_c':pd.Series.nunique}})

Error I am getting:

KeyError: ('B', 'C')

Questions

Is it possible to fix it in above technique?

I know I can do it individually and then merging as follows:

out_df=gb_a['B'].agg({'unique_b':pd.Series.nunique})
out_df= pd.merge(out_df,gb_a['C'].agg({'unique_c':pd.Series.nunique}), on='A', how='inner')

But I am supposed to do different aggregation on selected columns (a lot of columns). Do not want to merge that many times on a large data.

I am using Python 2.7

Thanks.

p.s. I have read different answers on similar topic. e.g.

Naming returned columns in Pandas aggregate function?

Community
  • 1
  • 1
Amrit
  • 75
  • 1
  • 3

1 Answers1

0

Per the docs, SeriesGroupBy objects have a nunique method. Therefore, you can aggregate gb_a using

gb_a.agg({'B': 'nunique', 'C': 'nunique'})

import pandas as pd

df = pd.DataFrame({'A': ['a', 'a', 'b', 'c'], 'B': ['x', 'x', 'x', 'x'], 'C': [
                  '1', '2', '3', '4'], 'D': [0, 0, 0, 0]})
gb_a = df.groupby(['A'], as_index=False)
result = gb_a.agg({'B': 'nunique', 'C': 'nunique'})
result = result.rename(columns={'B':'unique_b', 'C':'unique_c'})
print(result)

prints

   A  unique_c  unique_b
0  a         2         1
1  b         1         1
2  c         1         1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677