I was performing a simple example of financial data, trying to make a classic candlestick plot. To do it I had to calculate Open, Max, Min and Close for each unit of time. I decided to use the resample function alongside a groupby (for each symbol). To avoid multi-index juggling I decided to use the pd.NamedAgg function to make everything easier:
candles = (data.set_index('trade_datetime')
.groupby('instrument_symbol')
.resample('1T')
.agg(open=pd.NamedAgg("trade_price", "first"),
max=pd.NamedAgg("trade_price", "max"),
median=pd.NamedAgg("trade_price", "median"),
min=pd.NamedAgg("trade_price", "min"),
close=pd.NamedAgg("trade_price", "last"),
std=pd.NamedAgg("trade_price", np.std),
volume=pd.NamedAgg("trade_quantity", "sum")).reset_index())
Unfortunetly I got this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<timed exec> in <module>
TypeError: aggregate() missing 1 required positional argument: 'func'
I don't get the error when I remove the resample:
candles = (data.set_index('trade_datetime')
.groupby('instrument_symbol')
#.resample('1D')
.agg(open=pd.NamedAgg("trade_price", "first"),
max=pd.NamedAgg("trade_price", "max"),
median=pd.NamedAgg("trade_price", "median"),
min=pd.NamedAgg("trade_price", "min"),
close=pd.NamedAgg("trade_price", "last"),
std=pd.NamedAgg("trade_price", np.std),
volume=pd.NamedAgg("trade_quantity", "sum")).reset_index())
instrument_symbol open max median min close std volume
0 PETR4 31.0 31.0 30.2 28.39 30.0 0.714111 12400
I am using pandas 1.0.1 and below there are some samples to reproduce the error.
trade_datetime instrument_symbol trade_price trade_quantity
1166911 2019-11-04 10:32:09.737 PETR4 31.00 200
1174414 2019-11-04 11:30:14.359 PETR4 30.71 300
1208601 2019-11-04 15:23:06.619 PETR4 30.23 100
1355062 2019-11-05 17:06:03.523 PETR4 29.72 200
1260316 2019-11-06 11:11:48.144 PETR4 28.39 1100
1295823 2019-11-06 11:49:00.767 PETR4 29.50 100
1343467 2019-11-06 15:52:42.506 PETR4 29.42 100
1261615 2019-11-07 13:12:30.599 PETR4 30.05 200
1297542 2019-11-07 15:28:37.714 PETR4 30.85 600
1305454 2019-11-07 15:42:27.041 PETR4 30.90 100
1323388 2019-11-07 16:48:32.382 PETR4 30.87 100
1381162 2019-11-08 10:24:50.643 PETR4 30.20 100
1385193 2019-11-08 11:05:14.777 PETR4 30.66 9000
1423408 2019-11-08 16:46:33.172 PETR4 30.11 100
1447363 2019-11-08 17:52:57.999 PETR4 30.00 100
The old syntax with a dict with the columns as keys and the agg functions as values works. I know that there are different ways to retrieve this result but I really like to use this new NamedAgg function.
Am I doing something wrong? Is this a bug? I am a bit reluctant to open an issue since everything seems fine.