9

Suppose, I have an unsorted array of overlapped ranges. Each range is just a pair of integers begin and end. Now I want to find if a given key belongs to at least one of the ranges. Probably, I have to know the ranges it belongs as well.

We can assume the ranges array takes ~1M and fits the memory. I am looking for an easy algorithm, which uses only standard JDK collections without any 3d-party libraries and special data structures, but works reasonably fast.

What would you suggest?

Michael
  • 41,026
  • 70
  • 193
  • 341
  • Are the ranges sorted, or entirely unconstrained? – Kerrek SB Nov 18 '11 at 15:28
  • I assume linear search won't cut it? There are probably very clever ways to do it, but they'll likely violate your other requirements. Any indication of how many ranges and keys we have? –  Nov 18 '11 at 15:28
  • I'm unclear on the question, but it sounds like you'll need a hashtable of {key, range} pairs. – ben Nov 18 '11 at 15:29
  • Your question is a bit confusing. For the key to belong in the range does it have to be between the begin and end? Do you want to find every range the key belongs to? "The ranges may take ~1M". Do you mean the collection of ranges would be ~1M different ranges? – Danny Nov 18 '11 at 15:32
  • 1
    @ben: It seems more like the question is how to find a `range` for each `key` in the first place. One can of course use a hash table to store it once found, but I don't see how such a hashtable can be constructed without solving OP's problem first. –  Nov 18 '11 at 15:33
  • @Kerrek the ranges are not sorted – Michael Nov 18 '11 at 16:01
  • @delnan we probably have ~10^5 ranges and many-many keys – Michael Nov 18 '11 at 16:03

5 Answers5

5

Sort the ranges numerically by a custom Comparator, then for each key k build a one-element range [k, k] and do a binary search for this range with a different Comparator.

The Comparator for searching's compare(x,y) should return

  • <0 if x.max < y.min
  • >0 if x.min > y.max
  • 0 otherwise (its two range arguments overlap).

As noted by @Per, you need a different, stricter Comparator for sorting, but the first two clauses still hold.

This should work even if the ranges overlap, though you may want to merge overlapping ranges after sorting to speed up the search. The merging can be done in O(N) time.

This is in effect a static interval tree, i.e. one without O(lg N) insertion or deletion, in the same way that a sorted array can be considered a static binary search tree.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Sounds good! How would suggest sort the `ranges`? By `begin` or by `end`? – Michael Nov 18 '11 at 16:05
  • Exactly what does your `Comparator` do? I'm skeptical that this approach can be made work for overlapping intervals – the standard interval tree has two sorted lists for the intervals overlapping each split point, and the data structure described in CLRS needs to augment the tree (which is sorted by left endpoints) by the max right endpoint in each subtree. – Per Nov 18 '11 at 16:25
  • @Michael: expanded the answer. – Fred Foo Nov 18 '11 at 16:30
  • That choice of `Comparator` has non-transitive `equals`, which breaks the `Comparator` contract. Even if `sort` somehow manages to return an array consistent with the irreflexive transitive relation you've defined, binary search won't work. We could have a sorted array with intervals …, [2, 4], [0, 3], … and then a binary search that starts by comparing [1, 1] against [2, 4] will proceed to the wrong half. – Per Nov 18 '11 at 16:42
  • Oh, the `equals` method doesn't do what I thought it did. To clarify: the `compare` function is non-transitive in the sense that `compare(x,y) == 0 && compare(y,z) == 0` does not imply that `compare(x,z) == 0`. – Per Nov 18 '11 at 16:50
  • @Per: you're right that it won't work for sorting, but I don't get the remark about searching not working. – Fred Foo Nov 18 '11 at 16:51
  • @larsmans If the sort arranges overlapping intervals maliciously, then binary search can fail. In my example, it compares [1, 1] against [2, 4], finds it to be less, and then fruitlessly searches the left half of the array. [0, 3] can be to the right of [2, 4] because those intervals overlap. – Per Nov 18 '11 at 16:58
4

If you don't need to know which interval contains your point (EDIT: I guess you probably do, but I'll leave this answer for others with this question who don't), then

  1. Preprocess the intervals by computing two arrays B and E. B is the values of begin in sorted order. E is the values of end in sorted order.

  2. To query a point x, use binary search to find the least index i such that B[i] > x and the least index j such that E[j] ≥ x. The number of intervals [begin, end] containing x is i - j.


class Interval {
    double begin, end;
}

class BeginComparator implements java.util.Comparator<Interval> {
    public int compare(Interval o1, Interval o2) {
        return Double.compare(o1.begin, o2.begin);
    }
};

public class IntervalTree {
    IntervalTree(Interval[] intervals_) {
        intervals = intervals_.clone();
        java.util.Arrays.sort(intervals, new BeginComparator());
        maxEnd = new double[intervals.length];
        initializeMaxEnd(0, intervals.length);
    }

    double initializeMaxEnd(int a, int b) {
        if (a >= b) {
            return Double.NEGATIVE_INFINITY;
        }
        int m = (a + b) >>> 1;
        maxEnd[m] = initializeMaxEnd(a, m);
        return Math.max(Math.max(maxEnd[m], intervals[m].end), initializeMaxEnd(m + 1, b));
    }

    void findContainingIntervals(double x, int a, int b, java.util.Collection<Interval> result) {
        if (a >= b) {
            return;
        }
        int m = (a + b) >>> 1;
        Interval i = intervals[m];
        if (x < i.begin) {
            findContainingIntervals(x, a, m, result);
        } else {
            if (x <= i.end) {
                result.add(i);
            }
            if (maxEnd[m] >= x) {
                findContainingIntervals(x, a, m, result);
            }
            findContainingIntervals(x, m + 1, b, result);
        }
    }

    java.util.Collection<Interval> findContainingIntervals(double x) {
        java.util.Collection<Interval> result  = new java.util.ArrayList<Interval>();
        findContainingIntervals(x, 0, intervals.length, result);
        return result;
    }

    Interval[] intervals;
    double[] maxEnd;

    public static void main(String[] args) {
        java.util.Random r = new java.util.Random();
        Interval[] intervals = new Interval[10000];
        for (int j = 0; j < intervals.length; j++) {
            Interval i = new Interval();
            do {
                i.begin = r.nextDouble();
                i.end = r.nextDouble();
            } while (i.begin >= i.end);
            intervals[j] = i;
        }
        IntervalTree it = new IntervalTree(intervals);
        double x = r.nextDouble();
        java.util.Collection<Interval> result = it.findContainingIntervals(x);
        int count = 0;
        for (Interval i : intervals) {
            if (i.begin <= x && x <= i.end) {
                count++;
            }
        }
        System.out.println(result.size());
        System.out.println(count);
    }
}
Per
  • 2,594
  • 12
  • 18
  • Great! What if I want to know _which_ intervals contain the point? – Michael Nov 18 '11 at 16:54
  • 1
    @Michael Convert the algorithm in CLRS (as described in the Wikipedia page on interval trees) to use an array rather than a binary tree. I have to go now, but I'll post details in a while if no one else does first. – Per Nov 18 '11 at 17:07
  • @Michael Java code added. Consider it WTFPL licensed in case StackOverflow hasn't already claimed it for Aiur. `maxEnd[m]` contains the maximum value of end among `intervals[a], ..., intervals[m - 1]`. – Per Nov 18 '11 at 18:56
  • thanks for mentioning Wikipedia page on interval trees! – Aleksey Otrubennikov May 08 '20 at 11:05
3

I believe this is what you are looking for: http://en.wikipedia.org/wiki/Interval_tree

But check this simpler solution first to see if it fits your needs: Using java map for range searches

Community
  • 1
  • 1
Paul Jackson
  • 2,077
  • 2
  • 19
  • 29
1

simple solution with O(n) complexity:

for(Range range: ranges){
  if (key >= range.start && key <= range.end)
    return range;
} 

More clever algorithm can be applied if we know more information about ranges. Is they sorted? Is they overlapped? and so on

mishadoff
  • 10,719
  • 2
  • 33
  • 55
1

Given just your specification, I would be inclined to order the ranges by size, with the widest ranges first (use a custom Comparator to facilitate this). Then simply iterate through them and return true as soon as you find a range that contains the key. Because we know nothing else about the data, of course the widest ranges are the most likely to contain a given key; searching them first could be a (small) optimization.

You could preprocess the list in other ways. For instance, you could exclude any ranges that are completely enclosed by other ranges. You could order by begin and early-exit as soon as you encounter a begin value greater than your key.

Carl Manaster
  • 39,912
  • 17
  • 102
  • 155