groupby date using other start time than midnight

Question

I am aggregating some data by date.

for dt,group in df.groupby(df.timestamp.dt.date):
      # do stuff

Now, I would like to do the same, but without using midnight as time offset. Still, I would like to use groupby, but e.g. in 6AM-6AM bins. Is there any better solution than a dummy column? unfortunately, resample as discussed in

Resample daily pandas timeseries with start at time other than midnight Resample hourly TimeSeries with certain starting hour

does not work, as I do need to apply any resampling/aggregation function

score 4 · Accepted Answer · answered Oct 01 '19 at 18:12

4

You can, for example, subtract the offset before grouping:

for dt, group in df.groupby(df.timestamp.sub(pd.to_timedelta('6H')).dt.date):
    # do stuff

answered Oct 01 '19 at 18:12

Quang Hoang

146,074
10
56
74

ALollz · Answer 2 · 2019-10-02T02:08:44.057

There's a base argument for resample or pd.Grouper that is meant to handle this situation. There are many ways to accomplish this, pick whichever you feel is more clear.

'1D' frequency with base=0.25
'24h' frequency with base=6
'1440min' frequency with base=360

Code

df = pd.DataFrame({'timestamp': pd.date_range('2010-01-01', freq='10min', periods=200)})

df.resample(on='timestamp', rule='1D', base=0.25).timestamp.agg(['min', 'max'])
#df.resample(on='timestamp', rule='24h', base=6).timestamp.agg(['min', 'max'])
#df.resample(on='timestamp', rule=f'{60*24}min', base=60*6).timestmap.agg(['min', 'max'])

                                    min                 max
timestamp                                                  
2009-12-31 06:00:00 2010-01-01 00:00:00 2010-01-01 05:50:00  #[Dec31 6AM - Jan1 6AM)
2010-01-01 06:00:00 2010-01-01 06:00:00 2010-01-02 05:50:00  #[Jan1 6AM - Jan2 6AM)
2010-01-02 06:00:00 2010-01-02 06:00:00 2010-01-02 09:10:00  #[Jan2 6AM - Jan3 6AM)

For completeness, resample is a convenience method and is in all ways the same as groupby. If for some reason you absolutely cannot use resample you could do:

for dt, gp in df.groupby(pd.Grouper(key='timestamp', freq='24h', base=6)):
    ...

which is equivalent to

for dt, gp in df.resample(on='timestamp', rule='24h', base=6):
    ...

@00__00__00 if you can use `groupby` you can use `resample`... `for dt, gp in df.resample(on='timestamp', rule=f'{60*24}min', base=60*6):` It's just a convenience method really and to my knowledge is identical to a groupby. — ALollz, Oct 01 '19 at 19:49
That is, it's identical to the following uglier groupby call: `df.groupby(pd.Grouper(key='timestamp', freq=f'{60*24}min', base=60*6))` — ALollz, Oct 01 '19 at 19:51

groupby date using other start time than midnight

2 Answers2

Code