0

I am using data from yfinance which returns a pandas Data-Frame.

                            Volume
Datetime                          
2021-09-13 09:30:00-04:00   951104
2021-09-13 09:35:00-04:00   408357
2021-09-13 09:40:00-04:00   498055
2021-09-13 09:45:00-04:00   466363
2021-09-13 09:50:00-04:00   315385
2021-12-06 15:35:00-05:00   200748
2021-12-06 15:40:00-05:00   336136
2021-12-06 15:45:00-05:00   473106
2021-12-06 15:50:00-05:00   705082
2021-12-06 15:55:00-05:00  1249763

There are 5 minute intra-day intervals in the data-frame. I want to resample to daily data and get the idxmax of the maximum volume for that day.

df.resample("B")["Volume"].idxmax()

Returns an error:

ValueError: attempt to get argmax of an empty sequence

I used B(business-days) as the resampling period, so there shouldn't be any empty sequences.

I should say .max() works fine.

Also using .agg as was suggested in another question returns an error:

df["Volume"].resample("B").agg(lambda x : np.nan if x.count() == 0 else x.idxmax()) 

error:

IndexError: index 77 is out of bounds for axis 0 with size 0
Borut Flis
  • 15,715
  • 30
  • 92
  • 119

2 Answers2

1

For me working test if all NaNs per group in if-else:

df = df.resample("B")["Volume"].agg(lambda x: np.nan if x.isna().all() else x.idxmax())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can use groupby as an alternative of resample:

>>> df.groupby(df.index.normalize())['Volume'].agg(Datetime='idxmax', Volume='max')

                      Datetime   Volume
Datetime                               
2021-09-13 2021-09-13 09:30:00   951104
2021-12-06 2021-12-06 15:55:00  1249763
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Thanks. I have never seen this type of parameters in the agg function. I guess this is for when you select one column after the groupby. – Borut Flis Dec 08 '21 at 08:19
  • 1
    Yes, you're right. You probably know this form for DataFrame: `df.groupby(df.index.normalize()).agg(Datetime=('Volume', 'idxmax'), Volume=('Volume', 'max'))`? – Corralien Dec 08 '21 at 08:28
  • If you have not valid python identifiers like (`Datetime Peak`), you can use this form: `df.groupby(df.index.normalize())['Volume'].agg(**{'Datetime Peak': 'idxmax', 'Volume Max': 'max'})` – Corralien Dec 08 '21 at 08:53