Using the Pandas groupby
method to group data by hour of day is straightforward:
import pandas as pd
import numpy as np
# Create a sample dataset, a value for each hour in 48 hour
size = 48
df = pd.DataFrame(np.random.rand(size),
index=pd.date_range('2021-01-01', periods=size, freq='H'))
# Group the data by hour of day and find the mean
df.groupby(df.index.hour).mean()
Sometimes, it is needed to group the hours into bins, and this is accomplished with the pandas.cut
method as shown here.
This bins the hours into 00:00-05:59
, 06:00-11:59
, 12:00-17:59
, and 18:00-23:59
# Group by bins
bins = [0, 6, 12, 18, 24]
df['time_bin'] = pd.cut(df.index.hour, bins, right=False)
df.groupby('time_bin').mean()
However, binning the hours so that the hour 00:00 is in the center of the first bin is often desired,
21:00-02:59
, 03:00-08:59
, 09:00-14:59
, and 15:00-20:59
, but this is not possible...
# Use 00:00 as center of first bin
bins = [21, 3, 9, 15, 21]
df['time_bin'] = pd.cut(df.index.hour, bins, right=False)
# ValueError: bins must increase monotonically.
How can you groupby hour bins so that the 00:00 hour is in the center of the first bin?