22

I want to resample a TimeSeries in daily (exactly 24 hours) frequence starting at a certain hour.

Like:

index = date_range(datetime(2012,1,1,17), freq='H', periods=60)

ts = Series(data=[1]*60, index=index)

ts.resample(rule='D', how='sum', closed='left', label='left')

Result i get:

2012-01-01  7
2012-01-02 24
2012-01-03 24
2012-01-04  5
Freq: D

Result i wish:

2012-01-01 17:00:00 24
2012-01-02 17:00:00 24
2012-01-03 17:00:00 12
Freq: D

Some weeks ago you could pass '24H' to the freq argument and it worked totally fine. But now it combines '24H' to '1D'.

Was I using a bug with '24H' which is fixed now? And how can i get the wished result in a efficient and pythonic (or pandas) way back?

versions:

  • python 2.7.3
  • pandas 0.9.0rc1 (but doesn't work in 0.8.1, too)
  • numpy 1.6.1
bmu
  • 35,119
  • 13
  • 91
  • 108
MaM
  • 265
  • 1
  • 2
  • 5

3 Answers3

29

Resample has an base argument which covers this case:

ts.resample(rule='24H', closed='left', label='left', base=17).sum()

Output:

2012-01-01 17:00:00    24
2012-01-02 17:00:00    24
2012-01-03 17:00:00    12
Freq: 24H
JohnE
  • 29,156
  • 8
  • 79
  • 109
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
6

2021 Update: base is deprecated since version 1.1.0: The new arguments that you should use are ‘offset’ or ‘origin’.

df.resample('24H',
 origin=datetime(2012,1,1,17) # <--  ADD THIS
).sum() 

New in version 1.1.0

origin{‘epoch’, ‘start’, ‘start_day’}, Timestamp or str, default ‘start_day’ The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If a timestamp is not used, these values are also supported:

  • ‘epoch’: origin is 1970-01-01
  • ‘start’: origin is the first value of the timeseries
  • ‘start_day’: origin is the first day at midnight of the timeseries
Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55
2

2020 Update: for dataframes

Use the base keyword as referred in the doc:

base description of documentation

Code example:

df.resample(pd.Timedelta('24 hours'), # or '24H'
 base=17 # <--  ADD THIS
).sum() 
Tomas G.
  • 3,784
  • 25
  • 28