Data structure (in STL or Boost) for retrieving kth smallest/largest item in a set?

Question

I am looking for a data structure in C++ STL or boost with the following properties:

Retrieval of kth largest item in O(log n) time
Searching in O(log n) time
Deletion in O(log n) time

If such a data structure implementation doesn't exist, is there a way to adapt a different data structure with extra data (e.g., set) so that the above is possible?

Note: I've found is-there-any-data-structure-in-c-stl-for-performing-insertion-searching-and-r, but this is 5 years old and doesn't mention boost.

"*Retrieval of kth largest item in O(log n) time*" Is `k` in this case not proportional to `n`? That is, if the retrieval is O(k) relative to `k`, but O(1) relative to `n`, would that be OK? So searching for the `k`th element doesn't get longer with the number of `n`s, but it does get longer with the value of `k`. — Nicol Bolas, Nov 25 '19 at 16:39
An order statistic tree will do this: https://en.wikipedia.org/wiki/Order_statistic_tree See https://stackoverflow.com/a/23095152/12299000 for C++ STL/boost. — kaya3, Nov 25 '19 at 16:39
@kaya3 Thanks for the links. It looks like there is an implementation in libstdc++. However, this has to work with Visual Studio's compiler. Am I better off simply writing my own data structure at this point? — EricAtAIR, Nov 25 '19 at 17:14
If either `k` or `n - k` are < `log(n)`, then `std::set` is fine for this. — Caleth, Nov 25 '19 at 17:27
@EricAtAIR - not sure; I once wrote an order statistic tree in about 50-60 lines of Java, and it took an hour or two with the help of pen and paper, but of course it takes time to test it carefully too. The hard part is making it self-balancing. I suggest looking for an existing open-source implementation before writing your own. — kaya3, Nov 25 '19 at 17:36
@Caleth, the value of k is going to be essentially uniformly distributed between 0 and n, so set's performance won't work here. — EricAtAIR, Nov 25 '19 at 17:48

A. Mashreghi · Answer 1 · 2019-11-25T23:50:28.427

For the moment I assume that the elements are unique and that there are at least k elements. If not, you can use multiset similarly.

You can accomplish this using two sets in C++:

#include <set>

Set 1: Let's call this large. It keeps the k largest elements only.

Set 2: Let's call this rest. It keeps the rest of the elements.

Searching: Just search boths sets, takes O(log n) since both sets are red-black tree.

Deleting: If the element is in rest, just delete it. If not, delete it from large, and then remove the largest element from rest and put it into large. Deleting from red-black tree takes O(log n).

Inserting new elements (initializing): Each time a new element comes: (1) If large has less than k elements, add it to large. (2) Otherwise, if the element is greater than the minimum element in large, remove that minimum and move it to rest. Then, add the new element to large. If none of the above, simply add the new element to rest. Deleting and inserting for red-black trees takes O(log n).

This way, large always has the k largest elements, and the minimum of those is the k-th largest which you want.

I leave it to you to find how you can do search, insert, find min, find max, and delete in a set. It's not that hard. But all of these operations take O(log n) on a balanced binary search tree.

Thank you for your detailed answer. This would work nicely if k was always the same. Unfortunately, k is going to be different each time, and will (for all intents and purposes) be uniformly distributed between 1 and the current size of the set, with at least one deletion happening in between retrievals. — EricAtAIR, Nov 27 '19 at 14:32

Data structure (in STL or Boost) for retrieving kth smallest/largest item in a set?

1 Answers1