3

I was looking for stl/boost container which can provide following functionality:

  1. Auto insert element in sorted order. (log n)
  2. Return the index/depth of element from starting node. (log n)

If there isn't one, what would be the best way to achieve this? I am thinking of a solution using a doubly link list. Would that be good choice to solve this problem ?

Mike Kinghan
  • 55,740
  • 12
  • 153
  • 182
Vivek Goel
  • 22,942
  • 29
  • 114
  • 186
  • Sorted doubly linked list has insertion costs of O(#list/2). Anyway, why must it be sorted? – Deduplicator Apr 15 '14 at 19:45
  • A well created container class would be balanced so the depth wouldn't be relevant except for massive data sets. At which point you'd probably be farther ahead to use a hash – cppguy Apr 15 '14 at 19:46
  • I have no knowledge in the world CS or of 'O(n) etc' world, but to me it sounds like a tree container not double linked list. – UldisK Apr 15 '14 at 19:47
  • Usually inserting with O(log(n)) in real life is slower than insert with O(1) and sort in the end with O(nlog(n)).. I always just use vector and std quicksort – SwiftMango Apr 15 '14 at 22:54
  • The first requirement means that you need an associative container. But in 2. do you require the O(logN) lookup or random access? In other words, do you want to find an element or jump by some number of elements? – Adam Wulkiewicz Apr 15 '14 at 23:29
  • @AdamWulkiewicz Actually I want to solve this problem. http://stackoverflow.com/questions/23101165/counting-of-all-value-smaller-than-current-value I thought it can be solved if I have a structure which is auto sorted in new insertion and if I can get index of element in o(log n) – Vivek Goel Apr 16 '14 at 08:49

3 Answers3

6

UPDATE

You need an order statistic tree. The C++ Standard Library doesn't have any, nor does it offer an easy way to implement one, see

Nor does boost, see under Future work and at the question linked just above.

However, the good news is, that such a tree is available in libstdc++ as an extension!


(Original answer:)

  1. Auto insert element in sorted order. (log n)
  2. Return index/depth of element from starting node. (log n)

It seems to me that neither the C++ Standard Library nor boost offer a container that would provide these complexity guarantees out-of-the-box. You either have to implement this container yourself or relax your complexity requirements and allow O(n) for at least one of them.

if not: What will be the best way to achieve this ? I am thinking a solution using double link list. Will it be good choice to solve this problem ?

std::list is a doubly-linked list but you can only achieve linear time insertion. std::list is a big performance killer due to its poor use of the cache.

You might be better off with boost::container::flat_set which also offers only linear time insertion but still may surprise you with its speed due the excellent use of the cache (thanks to the hardware prefetcher). And as a bonus, you get random access iterators, so the index can be found in O(1) time if you already have the element.

If both complexity requirements are a must, then I don't see any easier way than implementing a self-balancing binary search tree and storing the subtree size as well on each parent node. Maintaining this extra information won't ruin the O(log n) complexity. It is significant and non-trivial work to implement it, even if you start with one of the red-black tree implementation of std::map (not guaranteed to be a red-black tree but in libstdc++ it is and it is open-source).


One more thing comes to mind: What is your usage pattern? Are your doing insertions and index look-ups completely at random, one after the other? If not, or at least mainly not, then you might switch datastructures in between and get away with one of the stl or boost containers.

Community
  • 1
  • 1
Ali
  • 56,466
  • 29
  • 168
  • 265
  • "Return index/depth of element from starting node." is a feature, not a complexity guarantee – sehe Apr 15 '14 at 23:14
  • It's guaranteed that the insert and lookup in std::set, std::map, etc. has O(logN) complexity and that the iterators iterates over the sorted range. – Adam Wulkiewicz Apr 15 '14 at 23:18
  • @sehe He wants that in `O(log n)` time and that is a complexity requirement. It is easy to do that with `std::set` and `std::map` in `O(n)` time so in my opinion `std::set` and `std::map` have this feature but not this complexity guarantee. Is it now clear how I meant it or shall I revise the answer? – Ali Apr 16 '14 at 07:17
  • @AdamWulkiewicz The problem is that you still need `O(n)` time to figure out the rank of a key. The sorted range would only help if you had random access iterators. See my related question [Is there any technical reason why std::lower_bound is not specialized for red-black tree iterators?](http://stackoverflow.com/q/20934717/341970) – Ali Apr 16 '14 at 07:19
  • @Ali Yes. I am doing insertion and lookup randomly. I want to insert a value and then find out what is the index of the value in sorted order after each step. – Vivek Goel Apr 16 '14 at 08:53
  • @VivekGoel Oh, that's not random :) OK, please give me some time, if that's what you need, perhaps I can come up with a simpler solution. I will get back to you. – Ali Apr 16 '14 at 12:20
  • @VivekGoel OK, I have a solution for you, please check my updated answer! :) – Ali Apr 16 '14 at 13:19
  • @Ali Though "an" operation is O(log n) _and_ `std::set` and `std::map` are implemented with a tree data structure internally, there is no way you can answer that question using `set` or `map`, so the complexity guarantee is moot. – sehe Apr 16 '14 at 13:20
  • @sehe Thanks. :) As for your previous comment: You can find out the rank of an element in a `std::set` by iterating over the elements (it is guaranteed to be in sorted order) in `O(n)` time. That's what I was referring to in the first version of the answer *"You either have to implement this container yourself or relax your complexity requirements and allow `O(n)` for at least one of them."* If you still find this claim confusing or wrong, please suggest a better sentence. Or I am missing your point... :( – Ali Apr 16 '14 at 13:28
  • @Ali From the question: _"depth of element from starting node"_ - I assumed he meant tree depth, not "number of iterator increments" (that is essentially unrelated, though always larger) – sehe Apr 16 '14 at 13:39
  • @sehe Ah, yes, that part of the question is somewhat fuzzy. It seems to me, based on his comments, that he actually wants the rank. (Makes sense, it is a much more useful information than the depth.) – Ali Apr 16 '14 at 13:43
  • @Ali I wouldn't call it useful information, really. I'd call the actual depth much more interesting (detecting balancedness?). Anyways, if the OP is after a flatset/flatmap, that's another question indeed. You already had my upvote for the intersting data structures that you unearthed :) – sehe Apr 16 '14 at 13:48
  • @sehe Interesting. Let me put it this way: The rank in my practical/technical applications is a quite useful information and I have never had a practical/technical application where the depth had any use. (I am a chemical engineer and I have been implementing numerical methods for a living.) On the other hand, I totally agree, if you are implementing the tree yourself and monitoring its performance (balancedness), then yes, the depth is a very important information. So let's just say, it depends on the circumstances which one (rank or depth) is useful. – Ali Apr 16 '14 at 13:59
  • @Ali I do feel that if rank is useful information, then a node-based container is not the right choice :) You could do with heaps, sorted vectors, intrusive maps and end up being more efficient. However, that's talking from my experience. I'll have to take a look at the rank tree/order statistics tree your answer mentions – sehe Apr 16 '14 at 14:02
  • @sehe Yes, that's why I was also proposing `flat_set` in my original answer too. But without seeing the OP's application and understanding the requirements, we can only *guess.* And at this point I leave it up to him to either accept an answer of refine his question/requirements. – Ali Apr 16 '14 at 14:06
  • @Ali Yeah, we got sidetracked. Thanks for explaining how you thought that std::set could deliver on the 'depth' question. Cheers – sehe Apr 16 '14 at 14:10
  • @sehe And I thank you for your feedback. Sometimes I get shortsighted and think that my application is the most important (rank vs. depth). Cheers – Ali Apr 16 '14 at 14:16
  • @Ali Of course you're right. Searching for the index in std::set would require O(N) operations. It was not clear to me what are the real requirements, and still isn't. This looks to me like a "XY problem". Btw, it should be possible to use Counted B-tree to meet the requirements. – Adam Wulkiewicz Apr 16 '14 at 16:46
  • @AdamWulkiewicz If you read my discussion with sehe, you will see that the requirements aren't clear to us either. :( We will have to wait for the OP to either pick an answer or clarify the requirements. – Ali Apr 16 '14 at 17:12
4

std::map and std::set acording to the standard guarantee O(log(N)) insertion and searching, they also satisfy the sorted order condition. Please see C++ standard at section 23.4.

Update after @StefanoSanfilippo constructive comment:

Have in mind though, that these containers allow only unique keys/elements. If you have multiple values you have to resort to std::multimap and std::multiset. These containers have almost the same properties with std::map and std::set but allow multiple keys/elements.

Now about with index/depth issue as far as it concerns STL containers, it's not guaranteed that std::map and std::set are implemented as binary-trees and as such there is no interface for accessing tree properties such as depth and index (please see How to find the depth of each node in std::map?). Making an educated guess, I think that the same goes for boost's tree like containers.

Update - Quoting from @Mooing Duck's comment:

boost trees also do not have ways to get the index.

Community
  • 1
  • 1
101010
  • 41,839
  • 11
  • 94
  • 168
0

A binary tree (B-tree) has logarithmic insertion and retrieval (equivalent to depth count) time in the average case and is naturally sorted when walked in-order.

Unfortunately, a B-tree insertion/retrieval time might degenerate to linear if the tree is unbalanced (worst case). If this is an issue, you should consider a Red-Black tree, which does not eliminate the problem, but mitigates it while keeping the insertion time logarithmic in the size of the tree.

Neither STL nor Boost implement a B- or RB- tree, although std::map is usually implemented as a RB.

Please keep in mind that a linked list has linear insertion time, even if double.

Stefano Sanfilippo
  • 32,265
  • 7
  • 79
  • 80