3

I was performing a simple example of financial data, trying to make a classic candlestick plot. To do it I had to calculate Open, Max, Min and Close for each unit of time. I decided to use the resample function alongside a groupby (for each symbol). To avoid multi-index juggling I decided to use the pd.NamedAgg function to make everything easier:

candles = (data.set_index('trade_datetime')
              .groupby('instrument_symbol')
              .resample('1T')
              .agg(open=pd.NamedAgg("trade_price", "first"), 
                   max=pd.NamedAgg("trade_price", "max"),
                   median=pd.NamedAgg("trade_price", "median"),
                   min=pd.NamedAgg("trade_price", "min"),
                   close=pd.NamedAgg("trade_price", "last"),
                   std=pd.NamedAgg("trade_price", np.std),
                   volume=pd.NamedAgg("trade_quantity", "sum")).reset_index())

Unfortunetly I got this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<timed exec> in <module>

TypeError: aggregate() missing 1 required positional argument: 'func'

I don't get the error when I remove the resample:

candles = (data.set_index('trade_datetime')
              .groupby('instrument_symbol')
              #.resample('1D')
              .agg(open=pd.NamedAgg("trade_price", "first"), 
                   max=pd.NamedAgg("trade_price", "max"),
                   median=pd.NamedAgg("trade_price", "median"),
                   min=pd.NamedAgg("trade_price", "min"),
                   close=pd.NamedAgg("trade_price", "last"),
                   std=pd.NamedAgg("trade_price", np.std),
                   volume=pd.NamedAgg("trade_quantity", "sum")).reset_index())
  instrument_symbol  open   max  median    min  close       std  volume
0             PETR4  31.0  31.0    30.2  28.39   30.0  0.714111   12400

I am using pandas 1.0.1 and below there are some samples to reproduce the error.

                 trade_datetime instrument_symbol  trade_price  trade_quantity
1166911 2019-11-04 10:32:09.737             PETR4        31.00             200
1174414 2019-11-04 11:30:14.359             PETR4        30.71             300
1208601 2019-11-04 15:23:06.619             PETR4        30.23             100
1355062 2019-11-05 17:06:03.523             PETR4        29.72             200
1260316 2019-11-06 11:11:48.144             PETR4        28.39            1100
1295823 2019-11-06 11:49:00.767             PETR4        29.50             100
1343467 2019-11-06 15:52:42.506             PETR4        29.42             100
1261615 2019-11-07 13:12:30.599             PETR4        30.05             200
1297542 2019-11-07 15:28:37.714             PETR4        30.85             600
1305454 2019-11-07 15:42:27.041             PETR4        30.90             100
1323388 2019-11-07 16:48:32.382             PETR4        30.87             100
1381162 2019-11-08 10:24:50.643             PETR4        30.20             100
1385193 2019-11-08 11:05:14.777             PETR4        30.66            9000
1423408 2019-11-08 16:46:33.172             PETR4        30.11             100
1447363 2019-11-08 17:52:57.999             PETR4        30.00             100

The old syntax with a dict with the columns as keys and the agg functions as values works. I know that there are different ways to retrieve this result but I really like to use this new NamedAgg function.

Am I doing something wrong? Is this a bug? I am a bit reluctant to open an issue since everything seems fine.

1 Answers1

0

It's a bug. Solved in pandas >=1.4.0 . Try workaround for <1.4.0: func = None.

.agg(func = None, open=pd.NamedAgg("trade_price", "first") ...
VovaM
  • 342
  • 4
  • 8