Finding similar or overlapping groups of times in Python

Question

I have a list of start and end times, my_list, which contains different groupings of start and end times.

8:45 - 11:00
9:00 - 11:00
9:15 - 11:00
9:30 - 11:00
12:30 - 15:00
12:30 - 15:00
12:45 - 15:00

These records are group therapy session start/end times for patients entered by one therapist. I need similar times grouped together so that I can check if there were more than 3 lists in the groupings. So from 9:00 - 11:00, there were 4 patients when the rules only allow 3.

This question's answers provide help for help finding all of the overlapping times in one given list and I've tried variations of the solutions posted there. But I need to find overlapping times based on groups of similar times.

my_list = [
  [dt.time(8,45), dt.time(11,0)],
  [dt.time(9,0), dt.time(11,0)],
  [dt.time(9,15), dt.time(11,0)],
  [dt.time(9,30), dt.time(11,0)],
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,45), dt.time(15,0)],
  ]

I need to group similar/duplicated/overlapping times...

group_one = [
  [dt.time(8,45), dt.time(11,0)],
  [dt.time(9,0), dt.time(11,0)],
  [dt.time(9,15), dt.time(11,0)],
  [dt.time(9,30), dt.time(11,0)]
]

group_two = [
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,30), dt.time(15,0)],
  [dt.time(12,45), dt.time(15,15)]
]

Eventually, checking len(group_one) > 3 returns True

# Pseudo-code
 for times in my_list:
  if start or end times are equal to or overlap each other:
    throw the times into separate, similar lists (or dicts where key = timeslot?)
  else:
    if start or end times....

Solution can be pure Python or Pandas as this data is coming from a dataframe and will be manipulated there. I can't help but feel there is a magical way of doing this in Pandas, just stuck!

Edit: changed data in question to match answer.

How is the pandas Series structured? So how was `my_list` created? — Josmoor98, Aug 07 '19 at 16:37
What do you do with a cascade of overlaps? For instance, with intervals of 9-12, 10-2, and 1-3, what will your groups be? The first and last don't overlap, but each overlaps the middle. — Prune, Aug 07 '19 at 16:45
@Erfan no, each start/end time could be different. For example, four different sets of times might only share the times of 9:45-10:00 — Martin Noah, Aug 07 '19 at 16:50
@Josmoor98 each row in df contains a StartTime and EndTime column. I throw them all into one list based on a groupby of another column value. @ Prune that wouldn't happen. The group schedule times are set. Variations in times just means the patient came in early/late, or left early/late — Martin Noah, Aug 07 '19 at 16:59

Josmoor98 · Accepted Answer · 2019-08-08T17:25:19.750

Assuming your dataframe looks something like this and there are no cascading overlaps as pointed out in the comments by Prune.

df = pd.DataFrame([[datetime(2019, 8, 7, 8, 45),datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 9),datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 9, 15),datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 9, 30), datetime(2019, 8, 7, 11)],
                   [datetime(2019, 8, 7, 12, 30),datetime(2019, 8, 7, 15)],
                   [datetime(2019, 8, 7, 12, 30),datetime(2019, 8, 7, 15)],
                   [datetime(2019, 8, 7, 12, 45),datetime(2019, 8, 7, 15)],
                   [datetime(2019, 8, 7, 15, 30),datetime(2019, 8, 7, 17)],
                   [datetime(2019, 8, 7, 15, 30),datetime(2019, 8, 7, 17)],
                   [datetime(2019, 8, 7, 15, 45),datetime(2019, 8, 7, 17)],
                   [datetime(2019, 8, 7, 18, 30),datetime(2019, 8, 7, 19)],
                   [datetime(2019, 8, 7, 18, 30),datetime(2019, 8, 7, 19)],
                   [datetime(2019, 8, 7, 18, 45),datetime(2019, 8, 7, 19)]], 
                   columns = ["start", "end"])

If the data structure in your question is consistent with all of your data, then the following should work. If there any cascading cases, as pointed out, this won't work.

Note, df['start'] needs to be sorted for this procedure to work.

idx = df.index[(df['end'].shift() <= df['start'])]
overlapping groups = np.split(df, idx)

Which yields a list of DataFrames.

[                start                 end
 0 2019-08-07 08:45:00 2019-08-07 11:00:00
 1 2019-08-07 09:00:00 2019-08-07 11:00:00
 2 2019-08-07 09:15:00 2019-08-07 11:00:00
 3 2019-08-07 09:30:00 2019-08-07 11:00:00,
                 start                 end
 4 2019-08-07 12:30:00 2019-08-07 15:00:00
 5 2019-08-07 12:30:00 2019-08-07 15:00:00
 6 2019-08-07 12:45:00 2019-08-07 15:00:00,
                 start                 end
 7 2019-08-07 15:30:00 2019-08-07 17:00:00
 8 2019-08-07 15:30:00 2019-08-07 17:00:00
 9 2019-08-07 15:45:00 2019-08-07 17:00:00,
                 start                 end
10 2019-08-07 18:30:00 2019-08-07 19:00:00
11 2019-08-07 18:30:00 2019-08-07 19:00:00
12 2019-08-07 18:45:00 2019-08-07 19:00:00]

Changing the very first start time to ```datetime(2019, 8, 7, 8, 45)``` results in one less overlapping group. — Martin Noah, Aug 07 '19 at 18:12
Updated, it also didn't catch the last group. `np.split` resolves the issues raised. Let me know if there any more issues — Josmoor98, Aug 07 '19 at 18:32
modifying an end time that butts right up to another time throws it off. For example, ```datetime(2019,8,7,9,30), datetime(2019,8,7,12,30)``` adds this record to the second grouping when it should be in the first. — Martin Noah, Aug 08 '19 at 16:40
To solve that specific issue, you can use `idx = df.index[(df['end'].shift() <= df['start'])]` instead. So just replace `<` with `<=`. That seems to work on my example code, but may not catch all cases. If that works, I will update the solution — Josmoor98, Aug 08 '19 at 16:48

Finding similar or overlapping groups of times in Python

1 Answers1