2

I want to apply two different aggregates on the same column in a pandas DataFrameGroupBy and have the new columns be named.

I've tried using what is shown here in the documentation. https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#named-aggregation

In [82]: animals.groupby("kind").agg(
   ....:     min_height=('height', 'min'),
   ....:     max_height=('height', 'max'),
   ....:     average_weight=('weight', np.mean),
   ....: )
   ....: 
Out[82]: 
      min_height  max_height  average_weight
kind                                        
cat          9.1         9.5            8.90
dog          6.0        34.0          102.75

Something like what I'm trying to do is:

df = pd.DataFrame({"year": [2001, 2001, 2001, 2005, 2005],
                   "value": [1, 2, 5, 3, 1]})

df = df.groupby("year").agg(sum=('value', 'sum'),
                            count=('value', 'size'))

However, this gives the following:

TypeError: aggregate() missing 1 required positional argument: 'arg'
Levi Baguley
  • 646
  • 1
  • 11
  • 18
  • I wrote a _very_ detailed answer on named aggregation [here.](https://stackoverflow.com/a/54300159/4909087) You're essentially asking how to aggregate with multiple functions on the same column. – cs95 Jul 20 '19 at 18:57
  • 1
    Yeah, I read your post before I posted this question and I just recently installed pandas so I thought I had the most updated version. I reinstalled to 0.25.0 and of course, it works now. – Levi Baguley Jul 20 '19 at 19:02

1 Answers1

9

Since you need two aggfunction for one columns , you may need to pass to list like when you are not update your pandas to 0.25.0

df = df.groupby("year").value.agg(['sum','count'])
df
      sum  count
year            
2001    8      3
2005    4      2 

In pandas 0.25.0

pd.__version__
'0.25.0'
df = df.groupby("year").agg(sum=('value', 'sum'),
                            count=('value', 'count'))
df
      sum  count
year            
2001    8      3
2005    4      2
BENY
  • 317,841
  • 20
  • 164
  • 234
  • I just recently installed pandas so I thought I had the most updated version. I reinstalled to 0.25.0 and of course, it works now. – Levi Baguley Jul 20 '19 at 19:10