4

Assume that I have an array of timestamp's (given as datetime or numpy.datetime64 for instance). Furthermore, each entry represents an occurrence. I want to bucket the events on a per-hour basis; i.e. count how many events occurred per hour. Is there a straightforward way to do it? Some Numpy/Pandas magic? Or is the only way is to convert the timestamp's to seconds and bucket the seconds, as suggested here

Community
  • 1
  • 1
Dror
  • 12,174
  • 21
  • 90
  • 160
  • "... as suggested here" Or here: http://stackoverflow.com/questions/28430323/how-to-resample-downsample-an-irregular-timestamp-list Note that if you convert the timestamps to seconds in an array `t`, the calculation is simply `numpy.bincount((t/3600).astype(int))`. – Warren Weckesser Feb 17 '15 at 15:06

1 Answers1

4

If you are using Pandas, and you have a time series that has a DateTime index, you can use the resample method, with how='count'. For example, in the following, rng is a range of timestamps, and ts is a series of values that uses rng as the index. (For your calculation, the values in the time series don't matter.) In this example, there are 360 "events" per hour, so the expected result is 360 for each hour except the last.

Create some sample data:

In [71]: import pandas as pd

In [72]: rng = pd.date_range('1/1/2011', periods=10000, freq='10S')

In [73]: ts = pd.Series(np.random.randint(0, 5, len(rng)), index=rng)

In [74]: ts
Out[74]: 
2011-01-01 00:00:00    2
2011-01-01 00:00:10    0
2011-01-01 00:00:20    1
2011-01-01 00:00:30    4
2011-01-01 00:00:40    1
2011-01-01 00:00:50    1
2011-01-01 00:01:00    2
2011-01-01 00:01:10    0
2011-01-01 00:01:20    3
2011-01-01 00:01:30    4
2011-01-01 00:01:40    2
2011-01-01 00:01:50    4
2011-01-01 00:02:00    1
2011-01-01 00:02:10    0
2011-01-01 00:02:20    4
...
2011-01-02 03:44:10    2
2011-01-02 03:44:20    0
2011-01-02 03:44:30    3
2011-01-02 03:44:40    0
2011-01-02 03:44:50    0
2011-01-02 03:45:00    4
2011-01-02 03:45:10    3
2011-01-02 03:45:20    2
2011-01-02 03:45:30    0
2011-01-02 03:45:40    1
2011-01-02 03:45:50    0
2011-01-02 03:46:00    2
2011-01-02 03:46:10    0
2011-01-02 03:46:20    2
2011-01-02 03:46:30    2
Freq: 10S, Length: 10000

Use the resample method to count the number of events in each hour. The first argument, 'H', means we are resampling to an hourly rate.

In [75]: ts.resample('H', how='count')
Out[75]: 
2011-01-01 00:00:00    360
2011-01-01 01:00:00    360
2011-01-01 02:00:00    360
2011-01-01 03:00:00    360
2011-01-01 04:00:00    360
2011-01-01 05:00:00    360
2011-01-01 06:00:00    360
2011-01-01 07:00:00    360
2011-01-01 08:00:00    360
2011-01-01 09:00:00    360
2011-01-01 10:00:00    360
2011-01-01 11:00:00    360
2011-01-01 12:00:00    360
2011-01-01 13:00:00    360
2011-01-01 14:00:00    360
2011-01-01 15:00:00    360
2011-01-01 16:00:00    360
2011-01-01 17:00:00    360
2011-01-01 18:00:00    360
2011-01-01 19:00:00    360
2011-01-01 20:00:00    360
2011-01-01 21:00:00    360
2011-01-01 22:00:00    360
2011-01-01 23:00:00    360
2011-01-02 00:00:00    360
2011-01-02 01:00:00    360
2011-01-02 02:00:00    360
2011-01-02 03:00:00    280
Freq: H, dtype: int64
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • The signature for this method has changed since pandas==1.4.2. Now we need to use the function agg() for our time series: ts.resample('H').agg(['count']) – Ederson Badeca Jun 16 '22 at 15:20