Aggregate and count number of continuous time intervals in pandas

Question

I want to aggregate the continuous time intervals and count them for each person in pandas.

As an example I have dataframe that looks like this:

     Name      Time
0     Bob  10:59:00
1     Bob  11:00:00
2     Bob  11:01:00
3     Bob  11:14:00
4     Bob  11:15:00
5     Bob  11:17:00
6     Bob  11:18:00
7     Bob  11:19:00
8     Bob  13:10:00
9     Bob  13:10:00
10    Bob  13:15:00
11    Bob  13:16:00
12    Bob  13:17:00
13    Bob  13:18:00
14    Bob  13:19:00
15  Alice  13:17:00
16  Alice  13:18:00
17  Alice  13:19:00

With the following output:

Note: I only need the count not the time intervals themselves.

Bob = 5
1. 10:59 - 11:01
2. 11:14 - 11:15
3. 11:17 - 11:19
4. 13:10 - 13:10
5. 13:15 - 13:19

Alice = 1
1. 13:17 - 13:19

please post your [mre] as [code/text, not image](https://meta.stackoverflow.com/a/285557/10197418). as for the question, have a look at [timedelta](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html). — FObersteiner, Sep 29 '20 at 15:12
[jfyi](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Danila Ganchar, Sep 29 '20 at 15:39

Valdi_Bo · Accepted Answer · 2020-09-29T16:07:36.667

I assume that your Time column is of string type.

You can use the following code:

result = df.groupby('Name', sort=False).apply(
    lambda grp: grp.groupby((pd.to_datetime(grp.Time).diff().dt
    .total_seconds() / 60).fillna(2).ge(2).cumsum())
    .Time.apply(lambda tt: f'{tt.iloc[0]} - {tt.iloc[-1]}'))\
    .rename_axis(['Name', 'Period'])

The result is:

Name   Period
Bob    1         10:59:00 - 11:01:00
       2         11:14:00 - 11:15:00
       3         11:17:00 - 11:19:00
       4         13:10:00 - 13:10:00
       5         13:15:00 - 13:19:00
Alice  1         13:17:00 - 13:19:00
Name: Time, dtype: object

The above code contains 2-level grouping:

First level - by Name,
Second level - by the number of a "period" - a sequence of consecutive time values.

From each 2nd level group there is taken the first and last time (separated with a minus).

The last part is to rename columns in the index, to better reflect the meaning of the data displayed.

If you want a printout looking almost exactly as in our post, run:

for key, grp in result.groupby(level=0, sort=False):
    print(f'{key} = {grp.size}')
    print(f'{grp.reset_index(level=0, drop=True).rename_axis(None).to_string()}\n')

(try yourself to see the result).

This time there are no dots after the period number, but in my opinion it is not important.

Works perfectly. It can also work with datetime type. To add something the Time column must be sorted first. — Ibrahim Sherif, Sep 29 '20 at 20:06
You're right, sort is required, but looking at your data sample I assumed that the input DataFrame is already sorted by *Date*, at least within each *Name*. — Valdi_Bo, Sep 30 '20 at 05:05

Aggregate and count number of continuous time intervals in pandas

1 Answers1