1

I need a sorted and indexed container. I want to build it based on std:set or std::vector. By using set, I need expensive std::distance to calculate the index of its element. By using vector, I know adding or deleting an element is expensive. Let's say my element is a pointer (small object). I know two operations' complexity is the same. But which one is faster? Thanks.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
user1899020
  • 13,167
  • 21
  • 79
  • 154
  • 5
    Have you tried to measure it? Also, `std::set` has other restrictions (such as unique values) and `std::vector` isn't sorted. What are you trying to do? – slugonamission Jan 02 '13 at 15:44
  • Measure it. A sorted vector *may* be better when you have a much larger number of lookups than of mutations, but it's impossible to predict this theoretically. – Kerrek SB Jan 02 '13 at 15:48
  • Don't guess, measure! Hint: `vector` has high locality – K-ballo Jan 02 '13 at 15:48
  • It really depend of the number of elements used... – benjarobin Jan 02 '13 at 15:48
  • I will learn to measure the CPU time and report my results. Any good tool/code to measure the CPU time? I remember it is like ***_PER_SEC. I don't like this way and every time need to check how to use it. Thanks. – user1899020 Jan 02 '13 at 15:50
  • On Unix systems `time ./testprogrambinary` – zch Jan 02 '13 at 15:52
  • Use a custom tree dataset keeping track of number of elements in child trees instead? Results in O(log(n)) for index calculation. – Bernhard Barker Jan 02 '13 at 15:54
  • 3
    To save you some time, [boost::flat_set](http://www.boost.org/doc/libs/1_52_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx). There's also an article linked in there that may interest you. – Benjamin Lindley Jan 02 '13 at 15:56
  • @ Benjamin Lindley Thank you so much. I will look into it. It sees what I need. – user1899020 Jan 02 '13 at 15:59
  • Please don't edit the question title based on the answers; it doesn't really fit with the Q&A format. – templatetypedef Jan 03 '13 at 04:53

3 Answers3

3

If you need a data structure that supports sorted order and lookup by index, you should definitely look into the order statistic tree, which is a data structure specifically designed to support exactly these operations. It supports insertions and deletions in O(log n) time, lookup of an element's index in O(log n) time, and lookup by value or by index in O(log n) time as well, so it should be dramatically faster than either the vector or the set approaches.

Unfortunately, the STL doesn't have order statistic trees built it, so you would have to search for a third party library (this earlier question and answer provides an example of one). That said, the speedup you should expect from the order statistic tree should make it well worth the investment.

Hope this helps!

Community
  • 1
  • 1
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
2

According to you guys' suggestions, I measured it as the bottom code (flat_set is added). The results for debug vesion is

for set: 4.720976s wall, 4.711230s user + 0.000000s system = 4.711230s CPU (99.8%) (also tested set::insert which takes little time)

for vector: 1.407571s wall, 1.404009s user + 0.000000s system = 1.404009s CPU (99.7%)

for flat_set: 0.327714s wall, 0.327602s user + 0.000000s system = 0.327602s CPU (100.0%)

And release version (I need to write out results to let the compiler optimization not to overkill my code) gives each about 10 times speedup.

And my conclusion is vectors are 2-3 times faster, and boost flat_set is the best. And for number of entries less than several hundreds (such as 200), flat_set insert is not slower than std::set (that w/o index calculation).

int N = 10000;
{
    boost::timer::auto_cpu_timer t;
    std::set<int> s;
    for (int i = 0; i < N; ++i)
    {
        auto it = s.insert(i);
        int index = std::distance(s.begin(), it.first);
    }
}

{
    boost::timer::auto_cpu_timer t;
    std::vector<int> v;
    for (int i = 0; i < N; ++i)
    {
        v.insert(v.begin(), i);
    }
}

{
    boost::timer::auto_cpu_timer t;
    boost::container::flat_set<int> s;
    for (int i = 0; i < N; ++i)
    {
        auto it = s.insert(-i); // negative sign is used to make insertion
                                // (at the beginning) expensive
        int index = std::distance(s.begin(), it.first);
    }
}
user1899020
  • 13,167
  • 21
  • 79
  • 154
0

Separate the storage and the index:

having a vector of integers {I} that index the storage vector of your object type, {T}.

{I} is sorted by comparison between the objects it points to in {T}. insert/delete to {I} is cheaper than those to {T}.

Whenever you add a new item to {T} vector, you insert its index to the sorted {I}.

when you delete an item, remove the index in {I}, but you may leave the object in {T} untouched, only push_back the index just removed into a reuse vector {I'}. next time when you add a new item, you can pop_back the index in {I'} and reuse the storage.

If you know the number of items ever used, you can call resize() on {T} at startup to avoid (re)allocation at run time.

This method is similar as using vector of pointers, the advantage is less dynamic allocation and more cache friendly, as object storage are in contiguous memory.