3

I am aggregating some data by date.

for dt,group in df.groupby(df.timestamp.dt.date):
      # do stuff

Now, I would like to do the same, but without using midnight as time offset. Still, I would like to use groupby, but e.g. in 6AM-6AM bins. Is there any better solution than a dummy column? unfortunately, resample as discussed in

Resample daily pandas timeseries with start at time other than midnight Resample hourly TimeSeries with certain starting hour

does not work, as I do need to apply any resampling/aggregation function

00__00__00
  • 4,834
  • 9
  • 41
  • 89

2 Answers2

4

You can, for example, subtract the offset before grouping:

for dt, group in df.groupby(df.timestamp.sub(pd.to_timedelta('6H')).dt.date):
    # do stuff
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
3

There's a base argument for resample or pd.Grouper that is meant to handle this situation. There are many ways to accomplish this, pick whichever you feel is more clear.

  • '1D' frequency with base=0.25
  • '24h' frequency with base=6
  • '1440min' frequency with base=360

Code

df = pd.DataFrame({'timestamp': pd.date_range('2010-01-01', freq='10min', periods=200)})

df.resample(on='timestamp', rule='1D', base=0.25).timestamp.agg(['min', 'max'])
#df.resample(on='timestamp', rule='24h', base=6).timestamp.agg(['min', 'max'])
#df.resample(on='timestamp', rule=f'{60*24}min', base=60*6).timestmap.agg(['min', 'max'])

                                    min                 max
timestamp                                                  
2009-12-31 06:00:00 2010-01-01 00:00:00 2010-01-01 05:50:00  #[Dec31 6AM - Jan1 6AM)
2010-01-01 06:00:00 2010-01-01 06:00:00 2010-01-02 05:50:00  #[Jan1 6AM - Jan2 6AM)
2010-01-02 06:00:00 2010-01-02 06:00:00 2010-01-02 09:10:00  #[Jan2 6AM - Jan3 6AM)

For completeness, resample is a convenience method and is in all ways the same as groupby. If for some reason you absolutely cannot use resample you could do:

for dt, gp in df.groupby(pd.Grouper(key='timestamp', freq='24h', base=6)):
    ...

which is equivalent to

for dt, gp in df.resample(on='timestamp', rule='24h', base=6):
    ...
ALollz
  • 57,915
  • 7
  • 66
  • 89
  • i can not use resample, as mentioned in the question – 00__00__00 Oct 01 '19 at 19:46
  • @00__00__00 if you can use `groupby` you can use `resample`... `for dt, gp in df.resample(on='timestamp', rule=f'{60*24}min', base=60*6):` It's just a convenience method really and to my knowledge is identical to a groupby. – ALollz Oct 01 '19 at 19:49
  • That is, it's identical to the following uglier groupby call: `df.groupby(pd.Grouper(key='timestamp', freq=f'{60*24}min', base=60*6))` – ALollz Oct 01 '19 at 19:51