Count overlapping time frames in a pandas dataframe, grouped by person

Question

I'm using the top solution here to determine the number of rows that have start and end times overlapping with the given row. However, I need these overlaps to be determined by groups and not across the whole dataframe.

The data I'm working with has start and end times for conversations and the name of the person involved:

id  start_time              end_time             name
1   2021-02-10 10:37:35     2021-02-10 12:16:22  Bob
2   2021-02-10 11:09:39     2021-02-10 13:06:25  Bob
3   2021-02-10 12:10:33     2021-02-10 17:06:26  Bob
4   2021-02-10 15:05:08     2021-02-10 21:07:05  Sally 
5   2021-02-10 21:07:26     2021-02-10 21:26:37  Sally

This is the solution from the previous post:

ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
d['overlap'] = (ends & starts).sum(0)
df

But this records overlap between conversations 3 and 4, whereas I'm only looking for overlap between 1 - 3 or between 4 - 5.

What I'm getting now:

id  start_time              end_time             name   overlap
1   2021-02-10 10:37:35     2021-02-10 12:16:22  Bob    2
2   2021-02-10 11:09:39     2021-02-10 13:06:25  Bob    1
3   2021-02-10 12:10:33     2021-02-10 17:06:26  Bob    1
4   2021-02-10 15:05:08     2021-02-10 21:07:05  Sally  1 
5   2021-02-10 21:07:26     2021-02-10 21:26:37  Sally  0

What I'd like to get:

id  start_time              end_time             name   overlap
1   2021-02-10 10:37:35     2021-02-10 12:16:22  Bob    2
2   2021-02-10 11:09:39     2021-02-10 13:06:25  Bob    1
3   2021-02-10 12:10:33     2021-02-10 17:06:26  Bob    0
4   2021-02-10 15:05:08     2021-02-10 21:07:05  Sally  1 
5   2021-02-10 21:07:26     2021-02-10 21:26:37  Sally  0

You should provide the data in a reproducible format so that people can copy-paste it while working on solutions. Also, have you looked into the `groupby()` function? — ddejohn, Feb 17 '21 at 19:55
I just edited my answer and it seems to work well, you have a mistake in your desired output because both Sallys should have overlap 0, id 5 does not fall with in the range of id 4 — Matt, Feb 17 '21 at 20:25
@Mark if my answer works for you can you please accept it, otherwise detail how it is does not answer your question — Matt, Feb 18 '21 at 11:39

Matt · Accepted Answer · 2021-02-17T20:19:20.050

1

I think this might give what you need.

Add in an extra & condition for matching on name too:

ends = df['start_time'].values < df['end_time'].values[:, None]
starts = df['start_time'].values > df['start_time'].values[:, None]
same_group = (df['name'].values == df['name'].values[:, None])

# sum across axis=1 !!!
df['overlap'] = (ends & starts & same_group).sum(1)

df

edited Feb 17 '21 at 20:19

answered Feb 17 '21 at 20:11

Matt

1,196
1
9
22

Count overlapping time frames in a pandas dataframe, grouped by person

1 Answers1

Linked