Finding overlapped intervals in a set of intervals

Question

We have a login system that tracks how long people are connected. I would like to write a code to find people who were online at the same time. Look at this example, please:

P1: [1,7]
P2: [2,5]
P3: [3,4]
P4: [6,8]

Think of these as intervals of Person 1 to 4. I want the output of the algorithm to be something like this:

P1, P2 : [2, 3]
P1, P2, P3 : [3, 4]
P1, P2 : [4, 5]
P1, P4 : [6,7]

I tried to solve the problem with two for loops so that we get a list of people whose intervals overlap, but the problem is dealing with intervals for more than one person. for example, in the above example, [3,4] not have to come in [4, 5] in line three because it calculated as a three-person interval.

Could you post your code? It will help to make a better idea of your actual input... — Dani Mesejo, Jan 10 '23 at 15:04
This approach should work with minimal extra bookkeeping: https://stackoverflow.com/questions/15013800/find-the-maximally-intersecting-subset-of-ranges/15059229#15059229 (extra bookkeeping is that instead of keeping track of the count, keep track of the individuals) — Dave, Jan 10 '23 at 16:02

score 3 · Answer 1 · answered Jan 10 '23 at 15:11

Encode all interval bounds in a form like 1:+P1, 7:-P1. Then sort all bounds chronologically and scan by increasing time. Also keep updated a list of the persons present by "excuting" insert or delete operations.

{} 1:+P1 {P1} 2:+P2 {P1, P2} 3:+P3 {P1, P2, P3} 4:-P3 {P1, P2} 5:-P2 {P1} ...

Every configuration occurs between one time bound and the next.

Some configurations can repeat over time, but you did not specify how to handle them. With the above method, they will be listed separately.

score 1 · Answer 2 · answered Jan 10 '23 at 15:23

One approach, assuming the input is a dictionary of person -> interval:

from collections import defaultdict
from itertools import pairwise

data = {"P1": (1, 7),
        "P2": (2, 5),
        "P3": (3, 4),
        "P4": (6, 8)}

table = defaultdict(list)
for key, (start, end) in data.items():
    for i, j in pairwise(range(start, end + 1)):
        table[(i, j)].append(key)

result = {k: v for k, v in table.items() if len(v) > 1}

for key, value in result.items():
    print(value, key)

Output

['P1', 'P2'] (2, 3)
['P1', 'P2', 'P3'] (3, 4)
['P1', 'P2'] (4, 5)
['P1', 'P4'] (6, 7)

It is so small, But I use timestamps and this code is so time consuming for my code. However I thank you for your answer. I didn't know about itertools in python. It's so useful. — Mohammad.J, Jan 12 '23 at 15:57

Stef · Accepted Answer · 2023-01-10T19:13:26.077

Heavily inspired by Minimum number of platforms required for a railway station, but maintaining a set of persons present instead of a count of trains present:

Also an implementation of the algorithm hinted at in Yves Daoust's answer.

The idea is to first build a chronological list of all events, where an event is either the arrival or departure of a person.

Then we iterate on the list of events, maintaining a set of persons present, adding the persons who arrive and removing the persons who leave.

Below, the chronological list of events is implemented as a python dict mapping times to python lists of events, so that events happening at the same time, if any, are naturally grouped together.

from operator import itemgetter

data = {
    'P1': [1,7],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [6,8]
}

def get_overlaps(data):
    events = {}
    for person, (start, end) in data.items():
        events.setdefault(start, []).append((person, set.add))
        events.setdefault(end, []).append((person, set.remove))
    # events = {1: [('P1', add)], 2: [('P2', add)], 3: [('P3', add)], 4: [('P3', remove)], 5: [('P2', remove)], 6: [('P4', add)], 7: [('P1', remove)], 8: [('P4', remove)]}
    present_persons = set()
    start_time = 0
    for new_time, grouped_events in sorted(events.items(), key=itemgetter(0)):
        if len(present_persons) >= 2:
            yield (frozenset(present_persons), (start_time, new_time))
        start_time = new_time
        for new_person, add_or_remove in grouped_events:
            add_or_remove(present_persons, new_person)

data = {
    'P1': [1,7],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [6,8]
}
overlaps = list(get_overlaps(data))

print(overlaps)
# [(frozenset({'P2', 'P1'}),       (2, 3)),
#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
#  (frozenset({'P2', 'P1'}),       (4, 5)),
#  (frozenset({'P4', 'P1'}),       (6, 7))]

data2 = { # three events at time 5: (P1,remove), (P2,remove), (P4,add) 
    'P1': [1,5],
    'P2': [2,5],
    'P3': [3,4],
    'P4': [5,8]
}

overlaps = list(get_overlaps(data2))

print(overlaps)
# [(frozenset({'P2', 'P1'}), (2, 3)),
#  (frozenset({'P2', 'P1', 'P3'}), (3, 4)),
#  (frozenset({'P2', 'P1'}), (4, 5))]

This algorithm has time complexity O(N), where N is the number of persons. For comparison, the algorithm in Dani Mesejo's answer has time complexity O(TN), where N is the number of persons and T the maximum time that any person is present, measured in number of instants.

Thanks for your great code and solution. I learned good things from that. I have some questions, why did you use frozen set? and why did you write it as a generator? Do these affect the execution time? I use datetime library for time and your implementation is so fast. — Mohammad.J, Jan 12 '23 at 15:53
You can use set instead of frozenset if you prefer, but what is important is to make sure no two sets are *the same object* as each other. Try the following code: `def f(): a = set(); for x in range(4): a.add(x); yield a` and then `print(list(f()))`. Instead of `[{0}, {0,1}, {0,1,2}, {0,1,2,3}]` as you might have expected, this will print instead `[{0,1,2,3}, {0,1,2,3}, {0,1,2,3}, {0,1,2,3}]`. Whereas if you do `def g(): a = set(): for x in range(4): a.add(x); yield frozenset(a)` then you'll get what you expect. — Stef, Jan 12 '23 at 16:03
I wrote it as a generator because it's typically easier to write functions that way in python, but you can easily write it as a function that returns a list instead, by adding `result = []` before the loop, and replacing `yield ...` with `result.append(...)` in the loop, and adding `return result` after the loop. — Stef, Jan 12 '23 at 16:06

Finding overlapped intervals in a set of intervals

3 Answers3