7

I'm in search for a data structure which efficiently operates over closed intervals with the following properties:

  • dynamically add or remove an interval

  • set, and anytime change, a number ("depth") for each interval. no two depths are ever the same

  • find all intervals that overlap with any given interval, sorted by "depth"

The closest structure I found is Interval tree but it lists the found intervals in arbitrary order with respect to their depths. I could collect all the "unsorted" intervals as reported and sort them afterwards but I was hopping it was possible to avoid to sort the result for every query.

Please, does anyone know of such data structure or have any suggestion how (if at all possible) to enhance the Interval tree to support such sorting?

Example:

  1. add [1,2] to the empty structure and set its depth to 1
  2. add [10,100], depth = 2
  3. add [5,55], depth = 3
  4. query for [5,50] reports [10,100] and [5,55]
  5. set depth of [10,100] to 3, and of [5,55] to 2
  6. query for [5,50] reports [5,55] and [10,100]

Edit

I'm more interested in fast adds/removes and queries, than in updating of depths. A depth can take as much as O(n), if that helps speed-up the other operations.

Ecir Hana
  • 10,864
  • 13
  • 67
  • 117
  • What about [a treap](http://en.wikipedia.org/wiki/Treap)? – Rerito Feb 13 '15 at 10:55
  • 1
    @Rerito I know treap, what about it? Its nodes have to have random priority, "depth" doesn't have to be random (and can by changed at any time), if that's what you meant..? – Ecir Hana Feb 13 '15 at 11:04
  • There cannot be two intervals having the same depth. At any time, right? In this case, the binary search tree approach (using an augmented tree) with a hashmap maintained on the side would suit you quite well – Rerito Feb 13 '15 at 11:11
  • @Rerito correct, all depths are distinct. But I don't understand that hashmap, how is it different than store the depth along with the interval? I mean, even with hashmap, I would have to sort the result afterwards, correct? – Ecir Hana Feb 13 '15 at 11:17
  • Correct but the updating of the depth would be much less painful – Rerito Feb 13 '15 at 11:21
  • Seems as though a remove/add pair could simulate a depth change. Should those operations really be named deactivate/activate (with true remove/add only being O(n))? – David Eisenstat Feb 18 '15 at 13:29
  • @DavidEisenstat I'm not sure I follow: yes, if you change a depth, it's the same as if you first removed the interval, changed the depth in O(1) outside, and inserted it back. What are you hinting at? – Ecir Hana Feb 18 '15 at 14:29
  • Letting depth updates be slow doesn't seem to help much then. – David Eisenstat Feb 18 '15 at 15:01

4 Answers4

5

Let's assume that the algorithm you want exists. Then let's create a set of a million intervals, each being [1, 1], with random depths, and insert them into such interval tree. Then let's query the interval [1, 1]. It should return all the intervals in sorted order, with complexity O(M + log N), but N = 1, so we are sorting a set of M elements in linear time.

In other words, sorting elements by depth after you get them from the interval tree is as good in terms of complexity as it is theoretically possible.

Ishamael
  • 12,583
  • 4
  • 34
  • 52
  • That's why I'm in search for a data structure which would somehow keep the intervals sorted and still allow for fast queries, for example. – Ecir Hana Feb 18 '15 at 10:37
  • 1
    What I am saying is that if such a datastructure existed, you would have been able to sort an array in linear time using that datastructure. Hence, such a datatructure cannot theoretically exist. – Ishamael Feb 18 '15 at 18:16
  • Hello Ishamael! Most likely the bounty will be automatically awarded to you since your answer received the most votes. I don't want to sound ungrateful but I think I owe you an explanation about why I wont accept it as the answer. I believe the argument you provided is incorrect - I don't try to stuff N random intervals somewhere and expect that they became sorted upon the first query. In fact, that's exactly the opposite what I'm after. I want a structure which maintains its ordering automatically as the elements come and go. To give an example: – Ecir Hana Feb 25 '15 at 00:58
  • I don't try to build a vector of random numbers which become sorted once queried, certainly that's O(N log N). But I can easily add a number to already sorted vector, and keep it sorted, in O(N). It is even possible to add a number in O(log N), and still be able to report the numbers in correct order in O(N). What I try to say is that you seem to base the argument on the well known [lower bound](http://en.wikipedia.org/wiki/Comparison_sort#Number_of_comparisons_required_to_sort_a_list), whereas my question is about spreading the work among building, maintaing and querying the data structure. – Ecir Hana Feb 25 '15 at 00:59
  • Let's say that the length of the interval tree is `N`, and number of elements returned by a query is `M`. Your current approach has `O(log N)` insert/delete complexity and `O(K log K + log N)` query complexity. Which one do you want to improve, and which one are you willing to sacrifice and how much? – Ishamael Feb 25 '15 at 02:52
  • I try to avoid the sorting, that would imply the `K log K` part. Note that `K` can be `O(n)`. I don't care about the speed of depth changing. Logarithmic insert and remove is certainly nice to have. That said, I just gave one example, there could be more. – Ecir Hana Feb 25 '15 at 10:24
  • If your implementation of the interval tree doesn't require intervals within a node to be sorted by high value, then for each node you can maintain an std::set of intervals in that node sorted by depth. It will require `O(log N log N)` to insert, because you need to update `log N` sets, and `O(K log(log N) + log N)` to query the interval, if you use a merge-sort-like merge process to merge the sets, because you will have at most `log N` matching nodes, and merge takes `log` of number of sets you merge multiplied by number of elements. – Ishamael Feb 25 '15 at 22:30
  • I apologize, I don't quite follow: "doesn't require" I can use any kind of tree? "sorted by high value" a tree has to have some ordering which decides which child goes to left or to right? "maintain an std::set" does the root contains a set of all the intervals? Is this somehow similar to B-tree or R-tree? Perhaps if you could draw a picture? – Ecir Hana Feb 26 '15 at 11:16
  • Well, that's just an idea. There are different implementations of interval trees, some of them require that **within a node** intervals are sorted by their high value. If yours is not such, that **within a node** of an interval tree you can maintain an std::set of the intervals that you would store in that node in your implementation, sorted by depth. In particular, yes, the root node would have a set of all the intervals. – Ishamael Feb 26 '15 at 19:29
1

The depth as you set it, is equivalent to the position of intervals in their imaginary list. So, the usual list of pairs of numbers is enough. List can easily add, remove or switch its items.

If you will need also to find the depth for the given interval, make a function for it (you didn't mention the need, though)

Gangnus
  • 24,044
  • 16
  • 90
  • 149
  • While you are correct about the positions in a list, the problem is that the query is O(n). That's what I meant by "efficiently" as interval tree clearly does better but then again, there is the problem with representing the depths... – Ecir Hana Feb 13 '15 at 12:46
  • @EcirHana You wanted to "to avoid to sort the result for every query." I answered exactly THAT. O(n) is much better than n*Log(n). If you see other problem, put another question.... It is impossible to optimize everything. You have to: 1. write down all POSSIBLE functions. 2. define the priorities for speed of the functions. – Gangnus Feb 13 '15 at 15:30
  • Your are misquoting me. The whole quote talks about collecting the result from the Interval tree, which is O(log n + m) and first then I wanted to avoid the sorting. But more importantly, I'm in search for efficient data structure, there is nothing efficient in iterating a list in the most naive way. No need to put another question or to use caps... – Ecir Hana Feb 13 '15 at 16:35
  • @EcirHana 1. You still haven't defined the priorities of the function optimization. Your question is thus undefined. You should say what you need beforehand, not say afterwards: "oh, no, it is not what I need". 2. Single words in caps mean merely logical stress. It is normal to use it. I merely tried to make the text more understandable. But as you wish... – Gangnus Feb 15 '15 at 21:34
1

Here's my solution in Java using a TreeMap which is basically a Binary-Tree

Test: http://ideone.com/f0OlHI

Complexity

     Insert : 2 * O(log n)

     Remove : 2 * O(log n)

     Search : 1 * O(log n)

ChangeDepth : 7 * O(log n)

findOverlap : O(n)

IntervalDataSet.java

class IntervalDataSet
{
    private TreeMap<Integer,Interval> map;

    public IntervalDataSet ()
    {
        map = new TreeMap<Integer,Interval> ();
    }

    public void print ()
    {
        for(Map.Entry<Integer,Interval> entry : map.entrySet())
        {
          Integer key = entry.getKey();
          Interval value = entry.getValue();

          System.out.println(key+" => ["+value.min+","+value.max+"] ");
        }
    }

    public boolean changeDepth (int depth, int newDepth)
    {
        if (!map.containsKey(depth)) return false;

        if (map.containsKey(newDepth)) return false;

        Interval in = map.get(depth);

        in.depth = newDepth;

        remove(depth); insert(in);

        return true;
    }

    public boolean insert (Interval in)
    {
        if (in == null) return false;

        if (map.containsKey(in.depth)) return false;

        map.put(in.depth, in); return true;
    }

    public boolean remove (int depth)
    {
        if (!map.containsKey(depth)) return false;

        map.remove(depth); return true;
    }

    public Interval get (int depth)
    {
        return map.get(depth);
    }

    public void print (int depth)
    {
        if (!map.containsKey(depth))
          System.out.println(depth+" => X ");
        else
          map.get(depth).print();
    }

    public void printOverlappingIntervals (Interval in)
    {
        for (Interval interval : map.values())
            if (interval.intersect(in))
                interval.print();
    }

    public ArrayList<Interval> getOverlappingIntervals (Interval in)
    {
        ArrayList<Interval> list = new ArrayList<Interval>();

        for (Interval interval : map.values())
            if (interval.intersect(in))
                list.add(interval);

        return list;
    }

    public int size ()
    {
        return map.size();
    }

}

Interval.java

class Interval
{
    public int min;
    public int max;
    public int depth;

    public Interval (int min, int max, int depth)
    {
        this.min = min;
        this.max = max;
        this.depth = depth;
    }

    public boolean intersect (Interval b)
    {
        return (b != null
            && ((this.min >= b.min && this.min <= b.max)
                || (this.max >= b.min && this.max <= b.max))
            );
    }

    public void print ()
    {
        System.out.println(depth+" => ["+min+","+max+"] ");
    }
}

Test.java

class Test
{
  public static void main(String[] args) 
  {
    System.out.println("Test Start!");
    System.out.println("--------------");

    IntervalDataSet data = new IntervalDataSet ();

    data.insert(new Interval( 1,3, 0 ));
    data.insert(new Interval( 2,4, 1 ));
    data.insert(new Interval( 3,5, 3 ));
    data.insert(new Interval( 4,6, 4 ));
    System.out.println("initial values");
    data.print();

    System.out.println("--------------");
    System.out.println("Intervals overlapping [2,3]");

    data.printOverlappingIntervals(new Interval( 2,3, -1 ));
    System.out.println("--------------");

    System.out.println("change depth 0 to 2");
    data.changeDepth( 0, 2 );
    data.print();
    System.out.println("--------------");

    System.out.println("remove depth 4");
    data.remove( 4 );
    data.print();
    System.out.println("--------------");

    System.out.println("change depth 1 to 4");
    data.changeDepth( 1, 4 );
    data.print();
    System.out.println("--------------");

    System.out.println("Test End!");
  }
}

IntervalDataSet2

Complexity

initialization : O(n)

   findOverlap : 2 * O(log n) + T(merge)

class IntervalDataSet2
{
    private Integer [] key;
    private TreeMap<Integer,Interval> [] val;
    private int min, max, size;

    public IntervalDataSet2 (Collection<Interval> init)
    {
        TreeMap<Integer,TreeMap<Integer,Interval>> map
            = new TreeSet<Integer,TreeMap<Integer,Interval>> ();

        for (Interval in : init)
        {
            if (!map.containsKey(in.min))
                map.put(in.min,
                    new TreeMap<Integer,Interval> ());

            map.get(in.min).put(in.depth,in);

            if (!map.containsKey(in.max))
                map.put(in.max,
                    new TreeMap<Integer,Interval> ());

            map.get(in.max).put(in.depth,in);
        }

        key = new Integer [map.size()];
        val = new TreeMap<Integer,Interval> [map.size()];

        int i = 0;
        for (Integer value : map.keySet())
        {
            key [i] = value;
            val [i] = map.get(value);
            i++ ;
        }

        this.size = map.size();
        this.min = key [0];
        this.max = key [size-1];

    }

    private int binarySearch (int value, int a, int b)
    {
        if (a == b)
            return a;

        if (key[(a+b)/2] == value)
            return ((a+b)/2);

        if (key[(a+b)/2] < value)
            return binarySearch(value, ((a+b)/2)+1, b);
        else
            return binarySearch(value, (a, (a+b)/2)-1);
    }

    public TreeMap<Integer,Interval> findOverlap (Interval in)
    {
        TreeMap<Integer,Interval> solution
            = new TreeMap<Integer,Interval> ();

        int alpha = in.min;
        int beta = in.max;

        if (alpha > this.max || beta < this.min)
            return solution;

        int i = binarySearch(alpha, 0,(size-1));
        int j = binarySearch(beta, 0,(size-1));

        while (alpha <= beta && key[i] < alpha) i++;
        while  (alpha <= beta && key[j] > beta) j--;

        for (int k = i; k <= j; k++)
            solution.addAll ( val[k] );

        return solution;
    }

}
Khaled.K
  • 5,828
  • 1
  • 33
  • 51
  • How do I query the IntervalDataSet? – Ecir Hana Feb 24 '15 at 12:05
  • @EcirHana updated the code, added a test case, check the link to see the execution – Khaled.K Feb 25 '15 at 06:06
  • Thanks a lot! Maybe I'm missing something obvious but my question is, how do I query for "an interval"? That is, I don't care to report one interval of given depth. The tricky part is to report all the intervals overlapped by the given range, and report them sorted by their depths. I cannot understand where does your code does this? – Ecir Hana Feb 25 '15 at 10:14
  • @EcirHana added functions `getOverlappingIntervals` & `printOverlappingIntervals`, check Test link – Khaled.K Feb 25 '15 at 10:29
  • So you're testing the whole content of the tree to see which intervals overlap with the query. In what way would it be better than a regular interval tree followed by a sorting step? (As suggested by OP's thoughts) – Rerito Feb 25 '15 at 10:36
  • @Rerito yes, it's `O(n)`. but it's a Binary-Tree which is sorted by definition, no sort is needed. – Khaled.K Feb 25 '15 at 10:38
  • @EcirHana I can suggest a different data-structure that can optimize the process of getting the overlapping intervals, but in that case intervals won't be sorted by depth. your opinion? – Khaled.K Feb 25 '15 at 10:49
  • That's exactly the problem, I would like to have both. Why they wont be sorted by depth? What data structure are you referring to? PS: Thanks a lot for the answers but yes, it would be nice if the speed [depended on the output size](http://en.wikipedia.org/wiki/Output-sensitive_algorithm) and not just the input size. I.e. O(n) is too slow if there are only few overlapped intervals. – Ecir Hana Feb 25 '15 at 10:55
  • Yes it's _O(n)_, but don't you think that's just bad? With a huge tree filled with sparse intervals it's just dreadful. The first approach (interval tree, then sort the output) is much better – Rerito Feb 25 '15 at 11:19
  • @EcirHana check it now, I've added `IntervalDataSet2` which can find overlaps in `2 log n` using binary-search & the results are ordered by depth.. the data-structure is not dynamic & code is not tested – Khaled.K Feb 28 '15 at 09:46
0

In the end it is quite difficult to think about this problems. What you have are actually a one dimensional space actualy a line. By adding the depth you get a second coordinate. By requesting that the depth is unique you can draw a picture actually.

Your query is to intersect every interval (line) of your picture with a rectangle from x1 to x2 and every y in Y.

So the naive look at this problem is to compare each line in the order of y if it intersects. Since you want all results this would require O(n) in order to find the answer. And this answer must be also sorted by depth in O(m log m).

You an try to go with a one dimensional R-tree. Allowing you to define regions on the go. Within such a structure each node spans a region from min to max. You can now split this region further. Since there are intervals fitting in both sections since the split is actually splitting it, those are stored within the node (and not within the child nodes). Within those child nodes you get again such a list and so on.

For your search you inspect all nodes and child nodes which regions intersect with your search interval.

Within each node the list of intervals are sorted according their depth values.

So the problem is reduced to a number of sorted lists (of all intervalls that are likely to be contained within your search interval). Those lists must now be filtered for every interval that is truely intersecting with your search interval. But you do not need to filter all. Since if such a node interval is totally contained within your search interval all its intervals and all its children intervals are truely intersecting with your search interval (since they are totally contained within).

To combine all those lists you just use union combination where you select the next element of that union which has the smallest depth. You sort all those lists by the depth of its first element (the element with the smallest depth in each list). Now you look for the first element and move it to the result. You compare now the next element within the list with the first element of the next list and if the depth is still smaller you copy it to. If the depth becomes bigger you just sort it with the correct position taking log k (where k is the number of non-empty lists in your collection) and go on with the now first list and repeat it until all lists are empty or work through since you maintain cursor positions for each list.

This way you only sort the lists and compare it with the next element if it is still smaller or insert it.

This is the best structure I can think about. first you easily rule out almost all intervalls that can not intersect with it. Than you compose the result based on the lists of potentials where you have lists that you know are part of the result in full (one can argue that only a certain number of intervall lists must be checked since trees split very fast). By controlling the split strategy you control the cost of each part. If you for example only split on >10 intervals you will ensure k

The worst case is bad but I guess the expected actual performance will be good and way better as O(n + m log m) since you use already sorted distinct lists of potential intersections.

And remember if a node has child elements it itself contains a list of all intervals that intersect with the split point. Therefore if your search intervall also intersects with the split point every interval of this node is also part of your result.

So feel free to experiment with this structure. It should be easy to implement within a single day.

Martin Kersten
  • 5,127
  • 8
  • 46
  • 77