0

Imagine two passages with that are represented with seconds of speaking in tuples. Example, speaking between the 566th second and 579th.

Old example

passage1 = (566,579),(573,583.33)
passage2 = (574,579.21),(614,620)

=> intersection (574,579.21)

New example

passage1 = (566,579),(570,590)
passage2 = (572,575),(577,620)

=> intersection (572,575) (577,590)

What would be the most efficient way to find the intersection between these segments? How can I represent time in python ? I am open to every idea since I haven't found any way how to represent it and make the calculations.

  • It seems that you've asked `two` questions here. Which one comes first? – Daniel Hao Jul 01 '22 at 15:04
  • Ou sorry, you are right. Intersection would be great. Union I can find it later if I have a good beggining on the intersection. Thank you for poiting it out @DanielHao –  Jul 01 '22 at 15:05
  • 1
    NP. Did you try to look into `set` for your possible solutions? It should help. It's helpful to provide some expected outputs using your inputs. – Daniel Hao Jul 01 '22 at 15:06
  • set is one option, it has in build options. Are you suggesting that it is better to create sets of floats between, those values ? I don't see any other way using set, can you please clarify @DanielHao –  Jul 01 '22 at 15:08
  • Are you looking at intersections between two time periods e.g. between 566-579 and 573-583.33? E.g. a function `intersection(566, 579, 573, 583.33)` which would return `573, 579`? Or between multiple periods (there could be multiple intersections with some or all of them)? – Stuart Jul 01 '22 at 15:13
  • The last one multiple periods. passage1 or passsage2 can also not be overlapping (exemple: passage1 = (566,579),(580,583.33)) => there is a blanc of 1 seconds. It shouldn't appear in the global intersection. The interesection that I am searching is the between passage1 and passage2 @Stuart –  Jul 01 '22 at 15:23
  • 1
    So what would your output be for the given example? – Stuart Jul 01 '22 at 15:27
  • @Stuart I added one more new example. I hope it can ben more clear –  Jul 04 '22 at 07:30
  • Does this answer your question? [how to overlap intervals efficiently](https://stackoverflow.com/questions/5313374/how-to-overlap-intervals-efficiently) – Sneftel Jul 04 '22 at 09:15

2 Answers2

0

First, sort the intervals within each passage to make sure they are in order of start time, and merge any overlapping intervals within each passage.

passage1 = (566,579),(573,583.33)
passage2 = (574,579.21),(614,620)       
passage3 = (566,579),(570,590)
passage4 = (572,575),(577,620)

def merge_overlapping_intervals(intervals):
    intervals = iter(sorted(intervals))
    start, end = next(intervals)
    for next_start, next_end in intervals:
        if next_start <= end <= next_end:
            end = next_end
        else:
            yield start, end
            start = next_start
            end = next_end
    yield start, end

print(list(merge_overlapping_intervals(passage1)))   # 566, 583.33

Then, find all overlaps between intervals in passage 1 with intervals in passage 2:

def interval_overlaps(intervals1, intervals2):
    """ Yield all overlaps of intervals in sequence intervals1 with intervals in sequence intervals2 """
    for start1, end1 in merge_overlapping_intervals(intervals1):
        for start2, end2 in merge_overlapping_intervals(intervals2):
            if end1 > start2 and end2 > start1:
                yield max(start1, start2), min(end1, end2)

print(list(interval_overlaps(passage1, passage2)))   # [(574, 579.21)]
print(list(interval_overlaps(passage3, passage4)))   # [(572, 575), (577, 590)]

Equivalently, you could identify all overlaps first, and then sort them and merge any overlapping overlaps.

Because of the nested loop, the time-efficiency of this is O(nm) where n and m are the number of segments in each passage. A more time-efficient algorithm for identifying overlaps should be possible, given that the passages are sorted, but would require a bit more thought to write, and I imagine may not be needed.

Stuart
  • 9,597
  • 1
  • 21
  • 30
0

Determining if two segments intersect is quite easy. If the end of either is before the start of the other, they don't intersect.

Once you know they intersect, the intersection itself is trivial to determine - it's the later of the two start times and the earlier of the two end times. Note that this could result in a zero length segment.

Now you could simply go through each of the segments in passage1 and check against all of the segments in passage2. But that would be a slow O(n²) algorithm (assuming both sequences are of the same length n), and you're interested in efficiency.

Since you say in the comments that the segments in each sequence will be non-overlapping, we can use that information to find a more efficient algorithm. We can generate a sorted timeline of start and stop events for each passage, and the periods when you have a start and stop for both represent an intersection. Unfortunately the example data you gave in the question does not meet the constraint, so I'm not sure this will work.

Edit: I see you've changed the example to meet the constraint, so now it actually works. I held off posting this because of that problem.

def intersections(seq1, seq2):
    events = sorted([(s, False, 0) for s,e in seq1] +
            [(e, True, 0) for s,e in seq1] +
            [(s, False, 1) for s,e in seq2] +
            [(e, True, 1) for s,e in seq2])
    starts = [None, None]
    for time, end_flag, seq_num in events:
        if end_flag:
            if (starts[0] is not None) and (starts[1] is not None):
                yield max(starts), time
            starts[seq_num] = None
        else:
            starts[seq_num] = time

>>> for s,e in intersections(passage1, passage2):
    print(s, e)

572 575
577 579
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622