1

I'm trying to merge overlapping intervals on two number lines, call them A and B.

For example, if ((0, 1), (5, 6)) denotes the interval (0, 1) on number line A and (5, 6) on number line B, I would merge that with ((1, 2), (6, 7)) to obtain the merged interval ((0, 2), (5, 7)).

However, I don't want to merge ((0, 1), (5, 6)) with ((1, 2), (8, 9)) because the (5, 6) can't be merged with (8, 9) on number line B.

Ideally I'm looking for solutions with better complexity and I've written a naive one in O(n^3).

I've tried a few approaches that I'll describe below:

A Slow Solution

I think a straightforward but slow solution is to try all pairwise merges and repeat until no more merges can be done:

(sorry, I haven't run this code, but hopefully it illustrates the idea)

while not merged_anything:
    merged_anything = False
    for i in range(len(intervals)):
        j = i + 1
        while j < len(intervals):
            if CanMerge(intervals[i], intervals[j]):
                intervals[i] = Merge(intervals[i], intervals[j])
                del intervals[j]
                merged_anything = True
            else:
                j += 1

After every pass through the list where we merge something, I think we need to go back to the beginning and repeat because it's possible that after merging two intervals, you are now able to merge a third one that was out of order.

For example:

[((0, 5), (0, 5)), ((0, 5), (10, 15)), ((0, 5), (5, 10))]

On the first pass, you'll be able to merge the first and third intervals. You'd need to go back to the beginning again to then merge in the second interval. The worst case would be like this:

[((0, 5), (0, 5)), ((0, 5), (1000, 1005)), ... , ((0, 5), (15, 20)), ((0, 5), (10, 15)), ((0, 5), (5, 10))]

This solution seems to be O(n^3) worst case.

Attempting to Modify the Sorting Approach for One Number Line

There is an O(nlogn) algorithm for merging overlapping intervals on one number line. One solution is here: http://www.geeksforgeeks.org/merging-intervals/ .

I first tried to extend it by sorting by the start of the range on number line A and then merging if the ranges on number line B also intersected. Unfortunately, it was naive and here is a case that would break it:

[((0, 3), (0, 3)), ((1, 2), (7, 9)), ((3, 4), (2, 3))]

We can merge ((0, 3), (0, 3)) with ((3, 4), (2, 3)), but because ((1, 2), (7, 9)) is between them and cannot be merged with either, the algorithm will never consider merging the first and last ranges.

Sorting by the start of the range on number line B or some iteration that switches between sorting on number line A and B won't work either. This example would break it

[((0, 3), (0, 3)), ((1, 2), (7, 9)), ((7, 9), (1, 2)), ((3, 4), (2, 3))]

The problem with this first method is that on number line A, there are multiple candidates that could be merged with (0, 3), but the first candidate to be processed in the sorted order might not be mergeable because of its range on number line B; however, subsequent candidates can still work.

Combining Sorting with Slow Solution

I also considered the possibility of sorting the intervals before inputting them to the slow solution. But a case like this would still induce worst case behavior if we tried to sort by the start of the interval on number line A:

[((0, 1005), (0, 5)), ((5, 1005), (1000, 1005)), ... , ((985, 1005), (990, 1000)), ((995, 1005), (10, 15)), ((1000, 1005), (5, 10))]

More Complicated Data Structures?

Maybe processing the intervals sorted by where they start on number line A will be helpful. By sorting the intervals based on number line A, we quickly get all the relevant candidates that have a interval on number line A that can be merged. However, of the candidate intervals, we can't quickly find which can be merged on number line B.

I have some understanding of the ideas behind segment trees and interval trees, but it seems like we'd have to rebuild/update them every time something gets merged? Any ideas? Thanks for the help.

user2570465
  • 2,437
  • 2
  • 18
  • 22
  • Possible duplicate of [Possible Interview Question: How to Find All Overlapping Intervals](https://stackoverflow.com/questions/4542892/possible-interview-question-how-to-find-all-overlapping-intervals) – samgak Jun 06 '17 at 21:11
  • 2
    If you think of e.g. ((0, 1), (5, 6)) as a rectangle which extends from 0 to 1 in the x direction and from 5 to 6 in the y direction then it can be merged with ((1, 2), (6, 7)) exactly because the two rectangles overlap at the point (1,6). So perhaps you should be looking for geometrical data structures, such as k-d trees or cover trees, which can be used to check for overlapping rectangles. – mcdowella Jun 07 '17 at 04:22

0 Answers0