3

Suppose I have a two vectors of "starts" and "stops", sorted in ascending order.

Vector 1 = [start1 stop1;
        start2 stop2;
        start3 stop3];

Vector 2 = [start4 stop4;
        start5 stop5;
        start6 stop6];

What is the most efficient way of determining the intersection/overlap of these two vectors?

user3642365
  • 549
  • 1
  • 6
  • 16

2 Answers2

2

I've had to do this on a couple of occasions. It's a simple task, but the logic can get quite messy.

One thing you must decide upon, up-front, is whether intervals are closed or open. That is, do the intervals [1,3] and [3,5] have an intersection at [3,3], or no intersection? I strongly advise "no intersection" (closed intervals tend to be much more painful to reason about than open or half-open intervals), but your use case may require otherwise.

I think the cleanest way to do this is to maintain a "current partial interval" from each list. By "partial" I mean that each interval may be "eaten away" from the bottom as intersections with intervals from the other list are recognized and output. This simplifies the logic by only forcing you to consider two intervals at a time, rather than processing all the V2 intervals which are relevant to some interval in V1.

To simplify the code further, you can allow intervals to be temporarily invalid, and start with both current intervals invalid. This makes the code more unnecessarily branchy, but it means you only have to handle updating them in one place and with one rule.

So the pseudocode goes like this (I'm destructively reading from V1 and V2, and writing to VI):

v1a,v1b = 0,0                    # Empty and hence invalid
v2a,v2b = 0,0                    # intervals to start with.

while True:
    if v1a >= v1b:               # Handle an invalid V1 interval
        if V1.empty():           # If there's no more V1s,
            return               # No more intersections are possible.
        else:
            v1a,v2a = V1.pop()   # Grab the next full interval from V1
    if v2a >= v2b:
        if V2.empty(): 
            return
        else:
            v2a,v2b = V2.pop()

    lower_bound = max(v1a, v2a)  # Determine the overlap, if any, between
    upper_bound = min(v1b, v2b)  # the current two intervals.

    if lower_bound < upper_bound:
        VI.push(lower_bound, upper_bound) # Output the overlapping interval.

    v1a = max(v1a, upper_bound)  # Snip away the region which has now been
    v2a = max(v2a, upper_bound)  # handled. This may make one or both invalid.

The last two lines are the tricky bit. If there was an intersection, then upper_bound is its upper end: There are no remaining intersecting ranges below it, so they can be removed from either or both current intervals. If, however, the two current intervals did not overlap, then it has the effect of setting the lower interval's a to its own b, making it invalid and causing it to be replaced on the next iteration.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
1

I believe you can take advantage of the fact that the lists are sorted and then do the following (pseudo code)

Grab the first spans Determine if the spans overlap If the spans overlap, then max(starts) to min(ends) is an overlap Increment the span with the smallest end * Merge results - You may need to pass over your results and merge overlapping elements

Here's some python that implements this (note - a and b would be lists that contain (start,end) tuples):

try:
        while True:
            if b_span[1]>a_span[0] and b_span[0]<a_span[1]:
                overlaps.append((a_span[0] if a_span[0] > b_span[0] else b_span[0],
                                 a_span[1] if a_span[1] < b_span[1] else b_span[1]))
            if a_span[1] < b_span[1]:
                a_span = a.pop(0)
            else:
                b_span = b.pop(0)
    except IndexError:
        pass
Stanton
  • 497
  • 3
  • 14