How do I specify custom aggregating functions so that they behave correctly when used in list arguments of pandas.DataFrame.aggregate
?
Given a two-column dataframe in pandas ...
import pandas as pd
import numpy as np
df = pd.DataFrame(index=range(10))
df['a'] = [ 3 * x for x in range(10) ]
df['b'] = [ 1 -2 * x for x in range(10) ]
... aggregating over a list of aggregation function specs is not a problem:
def ok_mean(x):
return x.mean()
df.aggregate(['mean', np.max, ok_mean])
a b
mean 13.5 -8.0
amax 27.0 1.0
ok_mean 13.5 -8.0
but when an aggregation is specified as a (lambda or named) function, this fails to aggregate:
def nok_mean(x):
return np.mean(x)
df.aggregate([lambda x: np.mean(x), nok_mean])
a b
<lambda> nok_mean <lambda> nok_mean
0 0.0 0.0 1.0 1.0
1 3.0 3.0 -1.0 -1.0
2 6.0 6.0 -3.0 -3.0
3 9.0 9.0 -5.0 -5.0
4 12.0 12.0 -7.0 -7.0
...
Mixing aggregating and non-aggregating specs lead to errors:
df.aggregate(['mean', nok_mean])
~/anaconda3/envs/tsa37_jup/lib/python3.7/site-packages/pandas/core/base.py in _aggregate_multiple_funcs(self, arg, _level, _axis)
607 # if we are empty
608 if not len(results):
--> 609 raise ValueError("no results")
610
While using the aggregating function directly (not in list) gives the expected result:
df.aggregate(nok_mean)
a 13.5
b -8.0
dtype: float64
Is this a bug or am I missing something in the way that I define aggregation functions? In my real project, i'm using more complex aggregation functions (such as a this percentile one). So my question is:
How do I specify custom aggregating function in order to workaround this bug?
Note that using the custom aggregating function over a rolling, expanding or group-by window gives the expected result:
df.expanding().aggregate(['mean', nok_mean])
## returns cumulative aggregation results as expected
Pandas version: 0.23.4