0

I want to aggregate the continuous time intervals and count them for each person in pandas.

As an example I have dataframe that looks like this:

     Name      Time
0     Bob  10:59:00
1     Bob  11:00:00
2     Bob  11:01:00
3     Bob  11:14:00
4     Bob  11:15:00
5     Bob  11:17:00
6     Bob  11:18:00
7     Bob  11:19:00
8     Bob  13:10:00
9     Bob  13:10:00
10    Bob  13:15:00
11    Bob  13:16:00
12    Bob  13:17:00
13    Bob  13:18:00
14    Bob  13:19:00
15  Alice  13:17:00
16  Alice  13:18:00
17  Alice  13:19:00

With the following output:

Note: I only need the count not the time intervals themselves.

Bob = 5
1. 10:59 - 11:01
2. 11:14 - 11:15
3. 11:17 - 11:19
4. 13:10 - 13:10
5. 13:15 - 13:19

Alice = 1
1. 13:17 - 13:19
Ibrahim Sherif
  • 518
  • 1
  • 4
  • 15
  • 3
    please post your [mre] as [code/text, not image](https://meta.stackoverflow.com/a/285557/10197418). as for the question, have a look at [timedelta](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timedelta.html). – FObersteiner Sep 29 '20 at 15:12
  • [jfyi](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Danila Ganchar Sep 29 '20 at 15:39

1 Answers1

1

I assume that your Time column is of string type.

You can use the following code:

result = df.groupby('Name', sort=False).apply(
    lambda grp: grp.groupby((pd.to_datetime(grp.Time).diff().dt
    .total_seconds() / 60).fillna(2).ge(2).cumsum())
    .Time.apply(lambda tt: f'{tt.iloc[0]} - {tt.iloc[-1]}'))\
    .rename_axis(['Name', 'Period'])

The result is:

Name   Period
Bob    1         10:59:00 - 11:01:00
       2         11:14:00 - 11:15:00
       3         11:17:00 - 11:19:00
       4         13:10:00 - 13:10:00
       5         13:15:00 - 13:19:00
Alice  1         13:17:00 - 13:19:00
Name: Time, dtype: object

The above code contains 2-level grouping:

  • First level - by Name,
  • Second level - by the number of a "period" - a sequence of consecutive time values.

From each 2nd level group there is taken the first and last time (separated with a minus).

The last part is to rename columns in the index, to better reflect the meaning of the data displayed.

If you want a printout looking almost exactly as in our post, run:

for key, grp in result.groupby(level=0, sort=False):
    print(f'{key} = {grp.size}')
    print(f'{grp.reset_index(level=0, drop=True).rename_axis(None).to_string()}\n')

(try yourself to see the result).

This time there are no dots after the period number, but in my opinion it is not important.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • Works perfectly. It can also work with datetime type. To add something the Time column must be sorted first. – Ibrahim Sherif Sep 29 '20 at 20:06
  • 1
    You're right, sort is required, but looking at your data sample I assumed that the input DataFrame is already sorted by *Date*, at least within each *Name*. – Valdi_Bo Sep 30 '20 at 05:05