1

I want to plot some datetimes and would like to specify a time interval in order to bundle them together and make a histogram. So for example, if there happen to be n datetimes in the span of one hour, group them together or parse them as year, month, day, hour. And omit minutes and seconds. Let's say I have a data frame with some datetime values:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days, 'col2': data})
df = df.set_index('test')
print(df)

2018-06-19 17:10:32.076646    29
2018-06-20 17:10:32.076646    56
2018-06-21 17:10:32.076646    82
2018-06-22 17:10:32.076646    13
2018-06-23 17:10:32.076646    35
2018-06-24 17:10:32.076646    53
2018-06-25 17:10:32.076646    25
2018-06-26 17:10:32.076646    23

Ideally, I would like to specify a more flexible time interval, such as "6 hours" in order to make some sort of modulo operation on the datetimes. Is this possible?

Ipsider
  • 553
  • 1
  • 7
  • 20
  • You can use `pd.cut` and then specify your bins with `pd.date_range` which will allow you to specify sensible regular intervals like `freq='6H'`. Or perhaps just [`pd.Grouper`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Grouper.html). If you need more help, you should edit your question to include a [MCVE](https://stackoverflow.com/help/mcve) – ALollz Jun 19 '18 at 14:57
  • Sorry for not providing a MCVE. I hope this helps. The approaches you mentioned sound very promising. How would I do this with the help of pd.cut or pd.Grouper? Thank you in advance. – Ipsider Jun 19 '18 at 15:14

1 Answers1

5

pd.Grouper

Allows you to specify regular frequency intervals with which you will group your data. Use groupby to then aggregate your df based on these groups. For instance, if col2 was counts and you wanted to bin together all of the counts over 2 day intervals, you could do:

import pandas as pd
df.groupby(pd.Grouper(level=0, freq='2D')).col2.sum()

Outputs:

test
2018-06-19 13:49:11.560185    85
2018-06-21 13:49:11.560185    95
2018-06-23 13:49:11.560185    88
2018-06-25 13:49:11.560185    48
Name: col2, dtype: int32

You group by level=0, that is your index labeled 'test' and sum col2 over 2 day bins. The behavior of pd.Grouper can be a little annoying since in this example the bins start and end at 13:49:11..., which likely isn't what you want.

pd.cut + pd.date_range

You have a bit more control over defining your bins if you define them with pd.date_range and then use pd.cut. Here for instance, you can define bins every 2 days beginning on the 19th.

df.groupby(pd.cut(df.index, 
                  pd.date_range('2018-06-19', '2018-06-27', freq='2D'))).col2.sum()

Outputs:

(2018-06-19, 2018-06-21]    85
(2018-06-21, 2018-06-23]    95
(2018-06-23, 2018-06-25]    88
(2018-06-25, 2018-06-27]    48
Name: col2, dtype: int32

This is nice, because if you instead wanted the bins to begin on even days you can just change the start and end dates in pd.date_range

df.groupby(pd.cut(df.index, 
                  pd.date_range('2018-06-18', '2018-06-28', freq='2D'))).col2.sum()

Outputs:

(2018-06-18, 2018-06-20]     29
(2018-06-20, 2018-06-22]    138
(2018-06-22, 2018-06-24]     48
(2018-06-24, 2018-06-26]     78
(2018-06-26, 2018-06-28]     23
Name: col2, dtype: int32

If you really wanted to, you could specify 2.6 hour bins beginning on June 19th 2018 at 5 AM:

df.groupby(pd.cut(df.index, 
                  pd.date_range('2018-06-19 5:00:00', '2018-06-28 5:00:00', freq='2.6H'))).col2.sum()
#(2018-06-19 05:00:00, 2018-06-19 07:36:00]     0
#(2018-06-19 07:36:00, 2018-06-19 10:12:00]     0
#(2018-06-19 10:12:00, 2018-06-19 12:48:00]     0
#(2018-06-19 12:48:00, 2018-06-19 15:24:00]    29
#....

Histogram.

Just use .plot(kind='bar') after you have aggregated the data.

(df.groupby(pd.cut(df.index, 
                   pd.date_range('2018-06-19', '2018-06-28', freq='2D')))
   .col2.sum().plot(kind='bar', color='firebrick', rot=30))

enter image description here

ALollz
  • 57,915
  • 7
  • 66
  • 89