3

I fail to find a way to properly name custom aggregate functions applied to rolling windows. This answer explains it well for groupby aggregates. I've tried using pd.NamedAggregates, like so

df
.rolling(f"{num_days_window + 1}D", min_periods=day_length)            
.aggregate(time_mean=pd.NamedAgg(column="time", aggfunc=lambda w: window_daily_stats(w, np.mean)),
           time_std=pd.NamedAgg(column="time", aggfunc=lambda w: window_daily_stats(w, np.std)))

Nested dictionaries for naming are deprecated, so that's not an option. Passing in tuples also doesn't work.

.rolling(f"{num_days_window + 1}D", min_periods=day_length)
.aggregate(time_mean=("time", lambda w: window_daily_stats(w, np.mean)),
           time_std=("time", lambda w: window_daily_stats(w, np.std)))

In both cases the error is the same:

TypeError: aggregate() missing 1 required positional argument: 'func'

The way I currently do it is I pass the aggregate function a dict containing column: list of functions pairs, but in that case the resulting columns are named

('time', '<lambda>'),
('time', '<lambda>'), 

Which unfortunately doesn't give me uniquely valued Index objects for columns.

All in all my question is, how do I create named aggregates for custom functions for rolling windows?

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
Grinjero
  • 436
  • 2
  • 7
  • Does this work? `df[['time_mean', 'time_std']] = df.time.rolling(...).agg(['mean', 'std']])` named aggregation does not work for rolling agg. – Emma Oct 14 '21 at 15:55
  • Unfortunately, I need to apply a specific custom function to the rolling window – Grinjero Oct 15 '21 at 07:25
  • you can do `df[['time_mean', 'time_std']] = df.time.rolling(...).agg([lambda w: window_daily_stats(w, np.mean), lambda w: window_daily_stats(w, np.std)])`. You can pass functions or functions name (string) in list. – Emma Oct 15 '21 at 14:20

3 Answers3

2

IIUC, there is a way by using the dunder attribute 'name' for lambda functions:

def window_daily_stats(w, function):
    return function(w)

cust_mean = lambda s: window_daily_stats(s, np.mean)
cust_std = lambda s: window_daily_stats(s, np.std)
cust_mean.__name__ = 'custom mean'
cust_std.__name__ = 'custom std'

then:

df.rolling(1).agg({'a':[cust_mean, cust_std]})

Output:

            a           
  custom mean custom std
0         0.0        0.0
1         1.0        0.0
2         2.0        0.0
3         3.0        0.0
4         4.0        0.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
1

At the time of writing, pandas==1.5.3 does not support NamedAgg syntax for .rolling aggregation. The closest one can get is using the list of functions to apply and then applying a custom rename.

Note that the lambda columns are due to the use of anonymous lambda functions, so the easy fix is to use regular functions:

from pandas import DataFrame
df = DataFrame(zip(range(5), range(5)), columns=['a', 'b'])

# these will be anonymous
mean = lambda x: sum(x)/len(x)
summ = lambda x: sum(x)

def mmax(x):
    return max(x)

def mmin(x):
    return min(x)

agg = df.rolling(1).agg({'a': [mean, summ], 'b': [mmax, mmin]})
print(agg)
#          a             b     
#   <lambda> <lambda> mmax mmin
# 0      0.0      0.0  0.0  0.0
# 1      1.0      1.0  1.0  1.0
# 2      2.0      2.0  2.0  2.0
# 3      3.0      3.0  3.0  3.0
# 4      4.0      4.0  4.0  4.0

Finally, to have a custom renaming logic, we can pipe the dataframe through a function that does the renames:

def _rename(df):
    df = df.copy() # avoid mutating the original
    df.columns = ["".join(c) for c in df.columns] # can apply custom renaming logic
    return df

print(agg.pipe(_rename))
#    a<lambda>  a<lambda>  bmmax  bmmin
# 0        0.0        0.0    0.0    0.0
# 1        1.0        1.0    1.0    1.0
# 2        2.0        2.0    2.0    2.0
# 3        3.0        3.0    3.0    3.0
# 4        4.0        4.0    4.0    4.0

In principle, _rename can be constructed programmatically from an existing dictionary that was prepared for named aggregations. Preparing it is pure Python and specific to the circumstances, so it's left as an exercise for the reader.

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
0

That's not possible with pandas.

This approach was only intended for dataframes and series.

In addition rolling windows do not have columns labels to use with pd.NamedAgg.

The concept of sending a rolling window, as it progresses, to an aggregation function to apply several functions to it by placing the result of the calculations in new columns using pd.NamedAgg specifications does not currently exist.

So we have to find another way to achieve the expected result.

An alternative with assign

import pandas as pd

df = pd.DataFrame({'col1':[1, 1, 2, 3, 3, 5, 8],
                   'col2':[1, 1, 2, 3, 3, 5, 8]})

df = (df.assign(special_name=df.rolling(3).aggregate({'col2': 'sum'}))
        .drop('col2', axis=1)
      )

#    col1  special_name
# 0     1           NaN
# 1     1           NaN
# 2     2           4.0
# 3     3           6.0
# 4     3           8.0
# 5     5          11.0
# 6     8          16.0

We assign first a new column name and this new column will receive the final result Series.

Note that we add .drop('col2', axis=1) to get rid of data source col2.

It's also possible working with several functions on a rolling window with assign as presented in the following script :

import pandas as pd

df = pd.DataFrame({'col1':[1, 1, 2, 3, 3, 5, 8],
                   'col2':[1, 1, 2, 3, 3, 5, 8]})

def sum_square(x):
    return sum([e**2 for e in x])

roller = df.rolling(3)

df.assign(
    special_name = roller.aggregate({'col2': 'sum'}),
    special_name2 = roller.aggregate({'col2': 'mean'}),
    special_name3 = roller.aggregate({'col2': lambda s: sum_square(s)})
)
   col1  col2  special_name  special_name2  special_name3
0     1     1           NaN            NaN            NaN
1     1     1           NaN            NaN            NaN
2     2     2           4.0       1.333333            2.0
3     3     3           6.0       2.000000            3.0
4     3     3           8.0       2.666667            3.0
5     5     5          11.0       3.666667            5.0
6     8     8          16.0       5.333333            8.0

Note that if we use several data sources, it is possible to get rid of them at the end of the calculation operations with for example the following instruction :

df.drop(['col1', 'col2', ..., 'coln'], axis=1, inplace=True)

Laurent B.
  • 1,653
  • 1
  • 7
  • 16