8

I have a pandas timeseries of 10-min freqency data and need to find the maximum value in each 24-hour period. However, this 24-hour period needs to start each day at 5AM - not the default midnight which pandas assumes.

I've been checking out DateOffset but so far am drawing blanks. I might have expected something akin to pandas.tseries.offsets.Week(weekday=n), e.g. pandas.tseries.offsets.Week(hour=5), but this is not supported as far as I can tell.

I can do a nasty work around by shifting the data first, but it's unintuitive and even coming back to the same code after just a week I have problems wrapping my head around the shift direction!

Any more elegant ideas would be much appreciated.

ajt
  • 542
  • 2
  • 6
  • 12

2 Answers2

18

The base keyword can do the trick (see docs):

s.resample('24h', base=5)

Eg:

In [35]: idx = pd.date_range('2012-01-01 00:00:00', freq='5min', periods=24*12*3)

In [36]: s = pd.Series(np.arange(len(idx)), index=idx)

In [38]: s.resample('24h', base=5)
Out[38]: 
2011-12-31 05:00:00     29.5
2012-01-01 05:00:00    203.5
2012-01-02 05:00:00    491.5
2012-01-03 05:00:00    749.5
Freq: 24H, dtype: float64
Jeril
  • 7,858
  • 3
  • 52
  • 69
joris
  • 133,120
  • 36
  • 247
  • 202
2

I've just spotted an answered question which didn't come up on Google or Stack Overflow previously:

Resample hourly TimeSeries with certain starting hour

This uses the base parameter, which looks like an addition subsequent to Wes McKinney's Python for Data Analysis. I've given the parameter a go and it seems to do the trick.

Community
  • 1
  • 1
ajt
  • 542
  • 2
  • 6
  • 12