How to make a histogram of pandas datetimes per specific time interval?

Question

I want to plot some datetimes and would like to specify a time interval in order to bundle them together and make a histogram. So for example, if there happen to be n datetimes in the span of one hour, group them together or parse them as year, month, day, hour. And omit minutes and seconds. Let's say I have a data frame with some datetime values:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days, 'col2': data})
df = df.set_index('test')
print(df)

2018-06-19 17:10:32.076646    29
2018-06-20 17:10:32.076646    56
2018-06-21 17:10:32.076646    82
2018-06-22 17:10:32.076646    13
2018-06-23 17:10:32.076646    35
2018-06-24 17:10:32.076646    53
2018-06-25 17:10:32.076646    25
2018-06-26 17:10:32.076646    23

Ideally, I would like to specify a more flexible time interval, such as "6 hours" in order to make some sort of modulo operation on the datetimes. Is this possible?

You can use `pd.cut` and then specify your bins with `pd.date_range` which will allow you to specify sensible regular intervals like `freq='6H'`. Or perhaps just [`pd.Grouper`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Grouper.html). If you need more help, you should edit your question to include a [MCVE](https://stackoverflow.com/help/mcve) — ALollz, Jun 19 '18 at 14:57
Sorry for not providing a MCVE. I hope this helps. The approaches you mentioned sound very promising. How would I do this with the help of pd.cut or pd.Grouper? Thank you in advance. — Ipsider, Jun 19 '18 at 15:14

ALollz · Accepted Answer · 2018-06-19T18:26:21.597

`pd.Grouper`

Allows you to specify regular frequency intervals with which you will group your data. Use groupby to then aggregate your df based on these groups. For instance, if col2 was counts and you wanted to bin together all of the counts over 2 day intervals, you could do:

import pandas as pd
df.groupby(pd.Grouper(level=0, freq='2D')).col2.sum()

Outputs:

test
2018-06-19 13:49:11.560185    85
2018-06-21 13:49:11.560185    95
2018-06-23 13:49:11.560185    88
2018-06-25 13:49:11.560185    48
Name: col2, dtype: int32

You group by level=0, that is your index labeled 'test' and sum col2 over 2 day bins. The behavior of pd.Grouper can be a little annoying since in this example the bins start and end at 13:49:11..., which likely isn't what you want.

`pd.cut` + `pd.date_range`

You have a bit more control over defining your bins if you define them with pd.date_range and then use pd.cut. Here for instance, you can define bins every 2 days beginning on the 19th.

df.groupby(pd.cut(df.index, 
                  pd.date_range('2018-06-19', '2018-06-27', freq='2D'))).col2.sum()

Outputs:

(2018-06-19, 2018-06-21]    85
(2018-06-21, 2018-06-23]    95
(2018-06-23, 2018-06-25]    88
(2018-06-25, 2018-06-27]    48
Name: col2, dtype: int32

This is nice, because if you instead wanted the bins to begin on even days you can just change the start and end dates in pd.date_range

df.groupby(pd.cut(df.index, 
                  pd.date_range('2018-06-18', '2018-06-28', freq='2D'))).col2.sum()

Outputs:

(2018-06-18, 2018-06-20]     29
(2018-06-20, 2018-06-22]    138
(2018-06-22, 2018-06-24]     48
(2018-06-24, 2018-06-26]     78
(2018-06-26, 2018-06-28]     23
Name: col2, dtype: int32

If you really wanted to, you could specify 2.6 hour bins beginning on June 19th 2018 at 5 AM:

df.groupby(pd.cut(df.index, 
                  pd.date_range('2018-06-19 5:00:00', '2018-06-28 5:00:00', freq='2.6H'))).col2.sum()
#(2018-06-19 05:00:00, 2018-06-19 07:36:00]     0
#(2018-06-19 07:36:00, 2018-06-19 10:12:00]     0
#(2018-06-19 10:12:00, 2018-06-19 12:48:00]     0
#(2018-06-19 12:48:00, 2018-06-19 15:24:00]    29
#....

Histogram.

Just use .plot(kind='bar') after you have aggregated the data.

(df.groupby(pd.cut(df.index, 
                   pd.date_range('2018-06-19', '2018-06-28', freq='2D')))
   .col2.sum().plot(kind='bar', color='firebrick', rot=30))

That is just perfect. Thank you so much. The pd.Grouper solution is perfect for my use case. — Ipsider, Jun 21 '18 at 11:45

How to make a histogram of pandas datetimes per specific time interval?

1 Answers1

`pd.Grouper`

`pd.cut` + `pd.date_range`

Histogram.

Linked

How to make a histogram of pandas datetimes per specific time interval?

1 Answers1

pd.Grouper

pd.cut + pd.date_range

Histogram.

Linked

`pd.Grouper`

`pd.cut` + `pd.date_range`