Assume that I have an array of timestamp
's (given as datetime
or numpy.datetime64
for instance). Furthermore, each entry represents an occurrence. I want to bucket the events on a per-hour basis; i.e. count how many events occurred per hour. Is there a straightforward way to do it? Some Numpy/Pandas magic? Or is the only way is to convert the timestamp
's to seconds and bucket the seconds, as suggested here
Asked
Active
Viewed 3,112 times
4
-
"... as suggested here" Or here: http://stackoverflow.com/questions/28430323/how-to-resample-downsample-an-irregular-timestamp-list Note that if you convert the timestamps to seconds in an array `t`, the calculation is simply `numpy.bincount((t/3600).astype(int))`. – Warren Weckesser Feb 17 '15 at 15:06
1 Answers
4
If you are using Pandas, and you have a time series that has a DateTime index, you can use the resample
method, with how='count'
. For example, in the following, rng
is a range of timestamps, and ts
is a series of values that uses rng
as the index. (For your calculation, the values in the time series don't matter.) In this example, there are 360 "events" per hour, so the expected result is 360 for each hour except the last.
Create some sample data:
In [71]: import pandas as pd
In [72]: rng = pd.date_range('1/1/2011', periods=10000, freq='10S')
In [73]: ts = pd.Series(np.random.randint(0, 5, len(rng)), index=rng)
In [74]: ts
Out[74]:
2011-01-01 00:00:00 2
2011-01-01 00:00:10 0
2011-01-01 00:00:20 1
2011-01-01 00:00:30 4
2011-01-01 00:00:40 1
2011-01-01 00:00:50 1
2011-01-01 00:01:00 2
2011-01-01 00:01:10 0
2011-01-01 00:01:20 3
2011-01-01 00:01:30 4
2011-01-01 00:01:40 2
2011-01-01 00:01:50 4
2011-01-01 00:02:00 1
2011-01-01 00:02:10 0
2011-01-01 00:02:20 4
...
2011-01-02 03:44:10 2
2011-01-02 03:44:20 0
2011-01-02 03:44:30 3
2011-01-02 03:44:40 0
2011-01-02 03:44:50 0
2011-01-02 03:45:00 4
2011-01-02 03:45:10 3
2011-01-02 03:45:20 2
2011-01-02 03:45:30 0
2011-01-02 03:45:40 1
2011-01-02 03:45:50 0
2011-01-02 03:46:00 2
2011-01-02 03:46:10 0
2011-01-02 03:46:20 2
2011-01-02 03:46:30 2
Freq: 10S, Length: 10000
Use the resample
method to count the number of events in each hour. The first argument, 'H'
, means we are resampling to an hourly rate.
In [75]: ts.resample('H', how='count')
Out[75]:
2011-01-01 00:00:00 360
2011-01-01 01:00:00 360
2011-01-01 02:00:00 360
2011-01-01 03:00:00 360
2011-01-01 04:00:00 360
2011-01-01 05:00:00 360
2011-01-01 06:00:00 360
2011-01-01 07:00:00 360
2011-01-01 08:00:00 360
2011-01-01 09:00:00 360
2011-01-01 10:00:00 360
2011-01-01 11:00:00 360
2011-01-01 12:00:00 360
2011-01-01 13:00:00 360
2011-01-01 14:00:00 360
2011-01-01 15:00:00 360
2011-01-01 16:00:00 360
2011-01-01 17:00:00 360
2011-01-01 18:00:00 360
2011-01-01 19:00:00 360
2011-01-01 20:00:00 360
2011-01-01 21:00:00 360
2011-01-01 22:00:00 360
2011-01-01 23:00:00 360
2011-01-02 00:00:00 360
2011-01-02 01:00:00 360
2011-01-02 02:00:00 360
2011-01-02 03:00:00 280
Freq: H, dtype: int64

Warren Weckesser
- 110,654
- 19
- 194
- 214
-
The signature for this method has changed since pandas==1.4.2. Now we need to use the function agg() for our time series: ts.resample('H').agg(['count']) – Ederson Badeca Jun 16 '22 at 15:20