What data structure to find number of elements in a given range in O(log n) time?

Question

I'm solving a problem and I realized I am need of a data structure with following properties but cant find one even after few hours of googling. I believe STL library is too rich to not have this one hence asking here.

Insert any element(should be able to contain repetetive ones) in O(log(n)) time
Remove an element in O(log(n)) time as well.
If i want to query for the number of elemenes in range [a,b], I should get that count in O(log(n)) time..

If I were to write it from scratch, for part 1 and 2, I would use a set or multiset and I would modify their find() method(which runs in O(log(N)) time) to return indices instead of iterators so that I can do abs(find(a)-find(b)) so I get the count of elements in log(N) time. But unfortunately for me, find() returns and iterator.

I have looked into multiset() and I could not accomplish requirement 3 in O(log(n)) time. It takes O(n).

Any hints to get it done easily?

I don't have my trusty guid book with, but if you can get hold of 'The Algorithm Design Manual' by 'Skiena, Steven S' do so. Is my to go source of algorithms. — ntroncos, Sep 22 '15 at 23:56
Im pretty sure hash tables are supposed to have log(n) time for everything but Im not sure — R Nar, Sep 22 '15 at 23:58
@RNar, Hashtables have O(1) for insert and delete , but finding number of elements in a given range is O(n). Besides having duplicate values is also a problem — Anoop, Sep 23 '15 at 00:00
@upr if its that these come into my mind: [Similar question](http://stackoverflow.com/questions/181693/what-are-the-complexity-guarantees-of-the-standard-containers), [Sumary](http://www.cs.northwestern.edu/~riesbeck/programming/c++/stl-summary.html), [STL](http://www.sgi.com/tech/stl/complexity.html) I have them bookmarked. — ntroncos, Sep 23 '15 at 00:07
`multi_set` has `count` member that takes `O(log(N) + K)` with `N` range size and `K` the number of matches. — TemplateRex, Sep 23 '15 at 07:07

behzad.nouri · Accepted Answer · 2015-09-26T13:03:49.247

Though not part of STL, you may use Policy-Based Data Structures which are part of gcc extensions; In particular you may initialize an order statistics tree as below. The code compiles with gcc without any external libraries:

#include<iostream>
#include<ext/pb_ds/assoc_container.hpp>
#include<ext/pb_ds/tree_policy.hpp>

using namespace __gnu_pbds;
using namespace std;

int main()
{
    tree<int,         /* key                */
         null_type,   /* mapped             */
         less<int>,   /* compare function   */
         rb_tree_tag, /* red-black tree tag */
         tree_order_statistics_node_update> tr;

    for(int i = 0; i < 20; ++i)
        tr.insert(i);

    /* number of elements in the range [3, 10) */
    cout << tr.order_of_key(10) - tr.order_of_key(3);
}

score 4 · Answer 2 · answered Sep 23 '15 at 00:07

Although the standard library is indeed well-featured, I don't think you'll find anything with these particular requirements in there. As you noted, the set-like structures return non-random-access iterators -- to provide random access (or some kind of distance function, as you require) would introduce significant complexity.

You may be able to achieve your goal by implementing an indexable skip list, which provides O(log(n)) insertion, deletion, and indexed lookup, as described here: https://en.wikipedia.org/wiki/Skip_list#Indexable_skiplist

A guide to implementation can be found here: http://cg.scs.carleton.ca/~morin/teaching/5408/refs/p90b.pdf

A skip list is the correct data structure for this task. The "count elements in range" task becomes two O(log(n)) lookups. — Borealid, Sep 23 '15 at 00:11

score 4 · Answer 3 · answered Sep 23 '15 at 00:25

The two obvious data structures for this task are the skip list (which Jack O'Reilly has already mentioned) and some variant of the order statistic tree (which Behzad mentions but doesn't really explain).

An order statistic tree stores one extra piece of information in each node. You can store any of a number of different things, but the one I find easiest to understand is if each node stores the number of elements in its left sub-tree.

When you insert, as you walk down the tree to store an element, you increment the count every time you descend to the left in the tree. Since you're only modifying the nodes you'd traverse anyway, this doesn't change the O(log N) insertion. When you re-balance, you have to adjust accordingly (but, again, you're only modifying counts in nodes you're already modifying when you do rotations, so (again) you don't affect the overall complexity.

When you need to find a distance from one element to another, you just find the two nodes, each with O(log N) complexity. You get the index of each element in the tree as you find it by initializing an index from the root, then updating it from there (subtract the count as you descend to the left, add as you descend to the right).

score 2 · Answer 4 · answered Sep 23 '15 at 00:09

2

You should be able to accomplish this with a standard or slightly modified B-Tree.

Typically most standard operations are O(log(n)) for B-Tree implementations.

answered Sep 23 '15 at 00:09

Erix

7,059
2
35
61

makar · Answer 5 · 2016-02-08T15:30:01.730

It certainly isn't the most space efficient, giving O(3n), but it satisfies the insert, remove and distance criteria listed above. The following uses a linked list, map and unordered_map.

The list is used to maintain order. Normally inserting into a list in order is linear time, but with help from a map of key to list_iterator, we can insert in constant time. The advantage of storing ordered data in a list is that it uses random access iterators meaning you can get a constant time std::distance call

The map is used to get a hint as to where in the list to insert a node. This is done by creating a new map entry for the given key, then decrementing the iterator by 1. The value of this new iterator gives us the appropriate insert position into the linked list.

The unordered_map gives us o(1) lookup for random access list iterators, allowing us to get a o(logn) remove and distance time.

class logn
{
public:
    using list_it = std::list<int>::iterator;

    void insert(int i)
    {
        const bool first = list.empty();
        auto original_it = map.insert({i,list.end()}).first; // o(logn)
        auto hint = original_it;
        if (!first)
            --hint; 
        umap[i] = list.insert(hint->second,i);  // o(1)
        original_it->second = umap[i]; 
    }

    void remove(int i)
    {
        auto it = umap.find(i); // o(1)
        list.erase(it->second); // o(1)
        umap.erase(it);         // o(1)
        map.erase(map.find(i)); // o(logn)
    }

    list_it get(int i) const
    {
        return umap.find(i)->second;
    }

    unsigned int distance(int lower, int upper) const
    {
        return std::distance(get(lower), get(upper));
    }

private:

    std::list<int> list;
    std::unordered_map<int, list_it> umap;
    std::map<int, list_it> map;

};

What data structure to find number of elements in a given range in O(log n) time?

5 Answers5

Linked