77

It's not an interview question per se, as I came across this in my project, but I figured it could be a decent intervew question.

You have N pairs of intervals, say integers. You're required to indentify all intervals that overlap with each other in O(N) time. For example, if you have

{1, 3} {12, 14} {2, 4} {13, 15} {5, 10}

the answer is {1, 3}, {12, 14}, {2, 4}, {13, 15}. Note that you don't need to group them, so the result can be in any order like in the example.

I just threw in O(N) time because the KMP algorithm takes O(N) for string search. :D

The best I came up with and what I'm using right now in the project is O(N^2). Yeah, brute force is pretty sad, but no one complains so I won't refactor it. :P Still, I was curious if a greater mind has a more elegant solution.

Svante
  • 50,694
  • 11
  • 78
  • 122
Tom Tucker
  • 11,676
  • 22
  • 89
  • 130
  • 8
    Two things are not clear: (1) you say "N pairs of intervals", though I'm fairly sure you actually mean "N intervals", since if there are only N *pairs* all overlaps can be trivially found in O(N) :-P Assuming N = number of intervals: (2) It is not possible to report all overlapping pairs in O(N) time because there could be O(N^2) of them! OTOH it is reasonable to ask for the O(N)-sized set of *all intervals that overlap at least one other interval*. Is that what you're asking for? – j_random_hacker Aug 01 '13 at 12:36
  • 3
    gbenison's answer (http://stackoverflow.com/a/9775727/47984) is the only one of the 9 currently here that actually answers the question in O(nlog n). Please consider marking that answer correct. – j_random_hacker Aug 01 '13 at 13:06
  • 2
    It's funny, because I had an interview with amazon and they asked me a similar question .... – AJ Meyghani Mar 04 '14 at 22:06
  • @j_random_hacker: Can you please explain why the answer from marcog from is not O(n lg n)? – stackoverflowuser2010 Aug 03 '18 at 00:04
  • 1
    @stackoverflowuser2010: The problem is mainly that the question is very badly formulated, as I wrote in my first comment. Interpreted literally, it has no solution, so answerers have (reasonably) looked for "similar" problems that do. If we interpret marcog's claim "We can find which intervals overlap with which ..." to mean listing all pairs of overlapping intervals, this contradicts his later claim that "This is an O(N logN) solution" -- there could be O(n^2) pairs, which *no* algorithm can list in O(n log n) time. – j_random_hacker Aug 03 '18 at 10:32
  • @stackoverflowuser2010: Basically, marcog's algorithm is a good way to answer *some* questions in O(n log n) time (like how many pairwise overlaps there are), but he doesn't try to clarify that it gets you something less than that (the obvious interpretation of) what the OP actually asked for. OTOH gbenison explicitly tried to tweak the OP's question into something sensible (i.e., actually solvable in O(n log n) time) -- namely, "List all intervals that overlap *at least one other* interval" -- and then solve that. – j_random_hacker Aug 03 '18 at 10:42
  • @j_random_hacker: I agree that the wording is not clear, but my initial interpretation of the question is: Find all intervals that overlap with another interval. Stated more specifically, find all intervals that overlap with at least one other interval. On the other hand, if the question clearly stated "find all pairs of intervals that overlap with one another," then that would be O(N^2) as you said. Note that OP's example does not show pairs but rather individual intervals. – stackoverflowuser2010 Aug 03 '18 at 17:21
  • @stackoverflowuser2010: That interpretation of the question is what gbenison first makes explicit and then solves. marcog's answer, OTOH, neglects to clarify things, then says that with some extra bookkeeping you can "find which intervals overlap with which", which to me clearly implies all pairs of intervals -- and then claims O(n log n) time. But finding these pairs requires O(n^2) time. – j_random_hacker Aug 03 '18 at 19:06

10 Answers10

104

Throw the endpoints of the intervals into an array, marking them as either start- or end-points. Sort them by breaking ties by placing end-points before start-points if the intervals are closed, or the other way around if they're half-open.

1S, 2S, 3E, 4E, 5S, 10E, 12S, 13S, 14E, 15E

Then iterate through the list, keeping track of how many intervals we're in (this equates to number of start-points processed minus number of end-points). Whenever we hit a start-point while we are already in an interval, this means we must have overlapping intervals.

1S, 2S, 3E, 4E, 5S, 10E, 12S, 13S, 14E, 15E
    ^                          ^
   overlap                    overlap

We can find which intervals overlap with which by storing data alongside the end-points, and keeping track of which intervals we're in.

This is an O(N logN) solution, with sorting being the main factor.

Jon
  • 5,275
  • 5
  • 39
  • 51
moinudin
  • 134,091
  • 45
  • 190
  • 216
  • 8
    "breaking ties by placing end-points before start-points" - depending how the intervals are defined. If they're half-open, then `{1,2} and {2,3}` don't overlap. If they're closed intervals, then that is an overlap. Question doesn't specify which. – Steve Jessop Dec 28 '10 at 00:50
  • 6
    @marcog: Not sure about this, but is the algorithm really O(nlogn)? If you need to return which intervals overlap with each other, it seems more like O(n^2). When all intervals overlap (like in {1,8}, {2,7}, {3,6}, {4,5}) O(n^2) intervals are in the solution. – Gruber Jan 07 '13 at 15:54
  • @Gruber: I think you're right. Still, if we just want the O(N_intervals)-size set of intervals that overlap some other interval, we can get this by repeating the algorithm a second time, but running backwards from the end, and taking the union of this and the result of the first run. We must also check top-level intervals as we go. Why? If an interval X overlaps some other interval Y, then at least one of the following is true: Y's starting point precedes X's (X caught on phase 1); Y's end follows X's (X caught on phase 2); Y is contained entirely within X *and X is at the top level*. – j_random_hacker Aug 01 '13 at 12:33
  • @Gruber: See gbenison's cute answer for a different approach. – j_random_hacker Aug 01 '13 at 13:07
  • @Gruber: marcog's approach allows you to _count_ the number of overlapping intervals in O(n log n); but you are right, identifying all the overlapping intervals is in O(n^2) in the worst case, as in your example. The problem comes from the last part of the algorithm: finding which interval overlaps with which. – antoine Jan 27 '16 at 06:49
  • @marcog This is some what analogous to counting on the open and close braces in an expression. So at any point of time if number of open braces are more than one that means there was a overlap. But the original idea of "converting the intervals to start and end points" is awesome. – Amol Dixit Jan 25 '17 at 02:25
  • @Antoine For getting which intervals overlap can't we keep a previous_start pointer which points to the last start node, when we encounter a start node such that we are more than 1 interval in, we can say that the previous_start n this start overlap. For example in the given question - when we reach 2S, previous_start = 1S, so we know 1S and 2S overlap. Now if I can extend this to cases like `{1,3},{2,4},{2,7},{6,8}` as well then this would work right? – faizan Apr 21 '17 at 05:46
  • 2
    @faizan The question asked by the original poster is not clear, which causes the confusion. As j_random_hacker said in his comment at the top: "It is not possible to report all overlapping pairs in O(N) time because there could be O(N^2) of them! OTOH it is reasonable to ask for the O(N)-sized set of all intervals that overlap at least one other interval". marcog and gcbenison both answer the second question in O(N log N) – antoine Apr 27 '17 at 18:36
  • Does this consider the situation where one interval contains another interval, e.g. {3,4} and {1, 5}? – zyxue Nov 17 '17 at 00:34
  • If the list is already sorted, would the time complexity be O(N) ? @marcog – alper Feb 04 '19 at 13:59
  • This comes up in timetables where multiple events can appear at the same time and you need to "slot" them, i.e. figure out how many columns you need and in which column each event needs to be placed such that there is no overlap. I used pretty much this solution. – WorldSEnder Mar 09 '19 at 21:10
34

Sort the intervals by start point. Then fold over this list, merging each interval with its neighbor (i.e. (1,4),(2,6) -> (1,6)) if they overlap. The resulting list contains those intervals with no overlapping partner. Filter these out of the original list.

This is linear in time after the initial sort operation which could be done with any (n log n) algorithm. Not sure how you'd get around that. Even the 'filter out duplicates' operation at the end is linear if you take advantage of the already-sorted order of the input and output of the first operation.

Here is an implementation in Haskell:

-- Given a list of intervals, select those which overlap with at least one other inteval in the set.
import Data.List

type Interval = (Integer, Integer)

overlap (a1,b1)(a2,b2) | b1 < a2 = False
                       | b2 < a1 = False
                       | otherwise = True
                                     
mergeIntervals (a1,b1)(a2,b2) = (min a1 a2, max b1 b2)
                                     
sortIntervals::[Interval]->[Interval]
sortIntervals = sortBy (\(a1,b1)(a2,b2)->(compare a1 a2))

sortedDifference::[Interval]->[Interval]->[Interval]
sortedDifference [] _ = []
sortedDifference x [] = x
sortedDifference (x:xs)(y:ys) | x == y = sortedDifference xs ys
                              | x < y  = x:(sortedDifference xs (y:ys))
                              | y < x  = sortedDifference (x:xs) ys

groupIntervals::[Interval]->[Interval]
groupIntervals = foldr couldCombine []
  where couldCombine next [] = [next]
        couldCombine next (x:xs) | overlap next x = (mergeIntervals x next):xs
                                 | otherwise = next:x:xs
                                               
findOverlapped::[Interval]->[Interval]
findOverlapped intervals = sortedDifference sorted (groupIntervals sorted)
  where sorted = sortIntervals intervals
                                     
sample = [(1,3),(12,14),(2,4),(13,15),(5,10)]
Yan Foto
  • 10,850
  • 6
  • 57
  • 88
gcbenison
  • 11,723
  • 4
  • 44
  • 82
  • This is actually the only answer I see here that will indeed find all intervals that overlap some other interval, and do so in O(nlog n) time. (marcog's algorithm is a start but is actually O(n^2).) I like the idea of "subtracting out" the combined intervals (which include all those that don't overlap anything else) to find the overlapping ones. – j_random_hacker Aug 01 '13 at 13:04
  • 1
    I must say that i'm generally a little slow, but i think i do not fully grasp this solution. Are you sure this solution finds all possible pairs of overlapping intervals? I also see, in the header of your solution, the comment saying: -- Given a list of intervals, select those which overlap with at least one other inteval in the set. Which is not the same as the question asked. Am I missing something here? – Pa_ Nov 06 '13 at 14:03
  • I think your `overlap` condition can be simplified to `overlap (_, b1) (a2,_) | b1 > a2 = True | otherwise = False`, or simply: `overlap (_, b1) (a2,_) = b1 > a2` assuming that the a1's are sorted. Otherwsie, just `overlap (_, b1) (a2,_) = (b1>a2) && (a1 – ssm Mar 24 '14 at 00:56
10

The standard approach for intervales-on-the-line problems is to sort them according to starting point and then just walk from first to last. O(n*logn) (O(n) if already sorted)

end = 0;
for (current in intervals) {
    if current.start < end {
        // there's an intersection!
        // 'current' intersects with some interval before it
        ...
    }
    end = max(end, current.end)
}
Nikita Rybak
  • 67,365
  • 22
  • 157
  • 181
  • You still need to check if the current interval intersects with the one to come before declaring it isolated. – jbx Dec 28 '10 at 01:12
  • @jbx I didn't say `current` interval is 'declared isolated' immediately, did I? I didn't even say this is a solution. There're many ways to adapt the approach to this particular problem, e.g. `isolated[current - 1] = false` or the one you mentioned. – Nikita Rybak Dec 28 '10 at 02:28
  • This is only a tiny part of the solution. You're forgetting that there can be independent sets of overlapping intervals. For example: {0, 5}, {1, 6}, {40, 45}, {41, 46} – Ilya Kogan Dec 28 '10 at 05:54
  • 1
    @Nikita so u just gave part of the solution and left the rest to the imagination? :) – jbx Dec 28 '10 at 09:15
  • @Ilya So? Both sets will be detected. – Nikita Rybak Dec 28 '10 at 13:01
  • @jbx I said it's a 'standard approach' and I *specifically* stated what happens in conditional block: "'current' intersects with some interval *before* it". If you have any brains, you'll adapt it easily to this particular problem. – Nikita Rybak Dec 28 '10 at 13:04
  • @Nikita If every time you check whether 'current' intersects with some interval before it, it makes the complexity O(N^2) because for each interval you're checking all the intervals that come before it. – Ilya Kogan Dec 28 '10 at 14:25
  • 1
    @Ilya In the pseudocode comment it doesn't say "let's check statement X", it says "statement X is true". – Nikita Rybak Dec 28 '10 at 15:24
  • @Nikita You're right! Now that I'm trying to run your solution in my head, it looks good. I must have missed something before, sorry :) – Ilya Kogan Dec 28 '10 at 15:31
5

Not sure about O(N) but what if we first sort them by the first number in each tuple, and then sequentially find those where the first number of the tuple is greater than that of the largest number seen in previous tuples, which also do not overlap with the next tuple.

So you would first get:

{1, 3}, {2,4}, {5, 10}, {12, 14}, {13, 15}

since 4 (largest) < 5 and 10 < 12, {5, 10} is isolated.

This would entail that we keep track of the largest number we encounter, and each time we find a tuple whose starting number is greater we check if it overlaps with the next.

This becomes dependent on the efficiency of the sorting algorithm then, because the latter process would be O(N)

jbx
  • 21,365
  • 18
  • 90
  • 144
2

Suppose the difference between start and endpoints is small, say < 32. eg. 1..32. Then each interval can be written as a bit pattern in a 32 bit word. e.g [1, 2] -> 001; [2, 3]-> 010; [1, 3] -> 011; [2, 3, 4] -> 110. Two intervals, or combinations of intervals, overlap if their bitwise AND is non-zero. eg. [1,2] overlaps [1,3] because 001&011 == 001, non-zero. O(n) alg is to keep a running bitwise OR of intervals seen so far and AND each new one:

bitsSoFar = 0
for (n=0; n < bitPatterns.length; n++)
    if (bitPatterns[n] & bitsSoFar != 0)
        // overlap of bitPatterns[n] with an earlier pattern
        bitsSoFar |= bitPatterns[n]

Left as an exercise:

  • modify algorithm to also identify overlap of a bit pattern with a later one

  • work out the bit pattern for an interval in O(1)

Shef
  • 44,808
  • 15
  • 79
  • 90
2

Here's an O(N lg N) implementation in Java that extends the answer provided by @Nikita Rybak.

My solution finds every interval that overlaps with at least one other interval and counts them both as overlapping intervals. For example, the two intervals (1, 3) and (2, 4) from OP's original question overlap each other, and so in this case there are 2 overlapping intervals. In other words, if interval A overlaps with interval B, then I add both A and B to the resulting set of intervals that overlap.

Now consider the intervals (1, 100), (10, 20) and (30, 50). My code will find that:

[ 10,  20] overlaps with [  1, 100]
[ 30,  50] overlaps with [  1, 100]

Resulting intervals that overlap with at least one other interval:
[  1, 100]
[ 30,  50]
[ 10,  20]

In order to prevent (1, 100) from being counted twice, I use a Java Set that keeps only unique Interval objects.

My solution follows this outline.

  1. Sort all intervals by starting point. This step is O(N lg N).
  2. Keep track of intervalWithLatestEnd, the interval with the latest end point.
  3. Iterate over all the intervals in the sorted list. If an interval overlaps with intervalWithLatestEnd, then add both to a Set. Update intervalWithLatestEnd when needed. This step is O(N).
  4. Return the Set (and convert to a List if needed).

The total running time is O(N lg N). It requires an output Set of size O(N).

Implementation

In order to add intervals to a set, I created a custom Interval class that override equals(), as expected.

class Interval {
    int start;
    int end;
    Interval(int s, int e) { 
        start = s; end = e; 
    }

    @Override
    public String toString() {
        return String.format("[%3d, %3d]", start, end);
    }
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + start;
        result = prime * result + end;
        return result;
    }
    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (getClass() != obj.getClass())
            return false;
        final Interval other = (Interval) obj;
        if (start != other.start)
            return false;
        if (end != other.end)
            return false;
        return true;
    }
}

And here is the code that runs the algorithm:

private static List<Interval> findIntervalsThatOverlap(List<Interval> intervals) {

    // Keeps unique intervals.
    Set<Interval> set = new HashSet<Interval>();

    // Sort the intervals by starting time.
    Collections.sort(intervals, (x, y) -> Integer.compare(x.start, y.start));

    // Keep track of the interval that has the latest end time.
    Interval intervalWithLatestEnd = null;

    for (Interval interval : intervals) {

        if (intervalWithLatestEnd != null &&
            interval.start < intervalWithLatestEnd.end) {

            // Overlap occurred.
            // Add the current interval and the interval it overlapped with.
            set.add(interval); 
            set.add(intervalWithLatestEnd);

            System.out.println(interval + " overlaps with " +
                               intervalWithLatestEnd);
        }

        // Update the interval with latest end.
        if (intervalWithLatestEnd == null ||
            intervalWithLatestEnd.end < interval.end) {

            intervalWithLatestEnd = interval;
        }
    }
    // Convert the Set to a List.
    return new ArrayList<Interval>(set);
}

Test cases

Here is a test case that runs the OP's original intervals:

public static void testcase() {

    List<Interval> intervals = null;
    List<Interval> result = null;

    intervals = new ArrayList<Interval>();

    intervals.add(new Interval(1, 3));
    intervals.add(new Interval(12, 14));
    intervals.add(new Interval(2, 4));
    intervals.add(new Interval(13, 15));
    intervals.add(new Interval(5, 10));


    result = findIntervalsThatOverlap(intervals);
    System.out.println("Intervals that overlap with at least one other interval:");
    for (Interval interval : result) {
        System.out.println(interval);
    }
}

with the result:

[  2,   4] overlaps with [  1,   3]
[ 13,  15] overlaps with [ 12,  14]
Intervals that overlap with at least one other interval:
[  2,   4]
[  1,   3]
[ 13,  15]
[ 12,  14]

Finally, here is a more advanced test case:

public static void testcase() {

    List<Interval> intervals = null;
    List<Interval> result = null;

    intervals = new ArrayList<Interval>();

    intervals.add(new Interval(1, 4));
    intervals.add(new Interval(2, 3));
    intervals.add(new Interval(5, 7));
    intervals.add(new Interval(10, 20));
    intervals.add(new Interval(15, 22));
    intervals.add(new Interval(9, 11));
    intervals.add(new Interval(8, 25));
    intervals.add(new Interval(50, 100));
    intervals.add(new Interval(60, 70));
    intervals.add(new Interval(80, 90));


    result = findIntervalsThatOverlap(intervals);
    System.out.println("Intervals that overlap with at least one other interval:");
    for (Interval interval : result) {
        System.out.println(interval);
    }
}

with the result:

[  2,   3] overlaps with [  1,   4]
[  9,  11] overlaps with [  8,  25]
[ 10,  20] overlaps with [  8,  25]
[ 15,  22] overlaps with [  8,  25]
[ 60,  70] overlaps with [ 50, 100]
[ 80,  90] overlaps with [ 50, 100]
Intervals that overlap with at least one other interval:
[  2,   3]
[  8,  25]
[  9,  11]
[ 50, 100]
[  1,   4]
[ 15,  22]
[ 10,  20]
[ 60,  70]
[ 80,  90]
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
2

If N pairs of intervals is integers, then we can get it in O(n).

Sort it by first number in the pair then the second number. If all are integers, we can use bucket sort or radix sort to get it by O(n).

{1, 3}, {2,4}, {5, 10}, {12, 14}, {13, 15}

Then combine one by one,

{1,3}

{1,4} with overlap {1,3} and {2,4}

{1,4}, {5,10}

{1,4}, {5,10}, {12,14}

{1,4}, {5,10}, {12,15} with overlap {12,14} and {13,15}

The combination would take O(N) time

will
  • 45
  • 4
0

This problem can be reduced to the element uniqueness problem.

Element uniqueness has Omega(n log n) lower bound (counting number of comparisons), so you can't do better than that.

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
Tea Tagger
  • 17
  • 1
  • 7
    When providing an answer, you are expected to be clear and specific. I'm not sure what's your deleted post look like, but in this post you can at least tell people how to reduce element uniqueness into interval overlapping. – miushock Mar 22 '14 at 18:21
0

It's been quite a while since I've used it, but the solution I used was an derivative of the red-black tree described in Introduction to Algorithms called an interval tree. It is a tree sorted by interval start, so you can quickly (binary search) first the first eligible node. IIRC, the nodes were ordered by a property that let's you stop "walking" the tree when the candidates nodes can't match your interval. So I think it was O(m) search, where m is the number of matching intervals.

I search found this implementation.

Brett

[edit] Rereading the question, this isn't what you asked. I think this is the best implementation when you have a list of (for instance) meetings already scheduled in conference rooms (which are added to the tree) and you want to find which rooms are still available for a meeting with a new start and duration (the search term). Hopefully this solution has some relevance, though.

Community
  • 1
  • 1
Brett Stottlemyer
  • 2,734
  • 4
  • 26
  • 38
-1

You can go over the list once and keep a hash table of all the intervals encountered so far. If an input interval is part of some interval from the hash table, merge it into the hash table interval. Mark the non-primitive intervals (merged intervals that consist of more than one interval).

Then you go over the list a second time, and for each interval check in the hash table whether it's contained in a merged interval or not.

I don't know if it's O(N), but it's much better than O(N^2).

Ilya Kogan
  • 21,995
  • 15
  • 85
  • 141
  • 4
    The only problem is, hashtable doesn't support 'interval intersection' operation :) – Nikita Rybak Dec 28 '10 at 00:45
  • Are you referring to some specific implementation of a hash table? Because I was talking about the concept. You can always implement the operation by yourself. – Ilya Kogan Dec 28 '10 at 05:52
  • @IlyaKogan I think the point Nikita is trying to make is that there is no obvious way to quickly find the intervals that intersect with a given query interval. The naive approach would take O(n) time per query, which would be an O(n^2) algorithm. You could use an interval tree, but that isn't really related to hash tables at all. – John Kurlak Dec 26 '14 at 18:52
  • @JohnKurlak Agree. Looks like I missed this part. – Ilya Kogan Dec 27 '14 at 16:21