1

I need a data structure in c++ STL for performing insertion, searching and retrieval of kth element in log(n)

(Note: k is a variable and not a constant)

I have a class like

class myClass{
  int id;
  //other variables
};

and my comparator is just based on this id and no two elements will have the same id.

Is there a way to do this using STL or I've to write log(n) functions manually to maintain the array in sorted order at any point of time?

cegprakash
  • 2,937
  • 33
  • 60
  • 1
    The answers assume (probably correctly) that you want to retrieve many "kth" elements for a given container (i.e. `k` varies at runtime during the lifetime of the container). That's worth making explicit in the question, or correcting if your requirements are less demanding (e.g. a single `k` per container, specified at compile or construction time) - there are then faster solutions. – Tony Delroy May 28 '14 at 09:10
  • 2
    You need "a STL"? o.O Do you mean a standard container, from the standard library? – Lightness Races in Orbit May 28 '14 at 09:37
  • Try `#include "libavl.h"` in C http://adtinfo.org/ – Khaled.K May 28 '14 at 09:38
  • I edited the question. Remove down votes plz :) – cegprakash Jun 02 '14 at 08:16

2 Answers2

5

Afaik, there is no such datastructure. Of course, std::set is close to this, but not quite. It is a red black tree. If each node of this red black tree was annotated with the tree weight (the number of nodes in the subtree rooted at this node), then a retrieve(k) query would be possible. As there is no such weight annotation (as it takes valuable memory and makes insert/delete more complex as weights have to be updated), it is impossible to answer such a query efficently with any search tree.

If you want to build such a datastructure, use a conventional search tree implementation (red-black,AVL,B-Tree,...) and add a weight field to each node that counts the number of entries in its subtree. Then searching for the k-th entry is quite simple:

Sketch:

  1. Check the weight of the child nodes, and find the child c which has the largest weight (accumulated from left) that is not greater than k
  2. Subtract from k all weights of children that are left of c.
  3. Descend down to c and call this procedure recursively.

In case of a binary search tree, the algorithm is quite simple since each node only has two children. For a B-tree (which is likely more efficient), you have to account as many children as the node contains.

Of course, you must update the weight on insert/delete: Go up the tree from the insert/delete position and increment/decrement the weight of each node up to the root. Also, you must exchange the weights of nodes when you do rotations (or splits/merges in the B-tree case).

Another idea would be a skip-list where the skips are annotated with the number of elements they skip. But this implementation is not trivial, since you have to update the skip length of each skip above an element that is inserted or deleted, so adjusting a binary search tree is less hassle IMHO.

Edit: I found a C implementation of a 2-3-4 tree (B-tree), check out the links at the bottom of this page: http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html

gexicide
  • 38,535
  • 21
  • 92
  • 152
  • 1
    A balanced binary search tree would do this easily :| But why isn't there any BBST implementation in C++ stl? – cegprakash May 28 '14 at 08:29
  • @cegprakash: What do you mean? A *perfectly* balanced binary search tree? Then the answer is: Because it is not updatable efficiently. – gexicide May 28 '14 at 08:31
  • Instead of implementing a data structure as you said, I could just use std::vector and simple binary search functions. – cegprakash May 28 '14 at 08:35
  • 1
    @cegprakash: Then you cannot have insertion in `O(log n)` :( However, `retrieve(k)` would then even be O(1), just use `[]` :) – gexicide May 28 '14 at 08:36
  • @cegprakash: Sure, you can find the insert position in `log n`, but then you have to do the insert, shifting all elements right of the insert position which is `O(n)` – gexicide May 28 '14 at 08:37
  • insertion and deletion of a known position is O(1) in std::vector. It's like a linked list ftw! – cegprakash May 28 '14 at 08:38
  • 4
    @cegprakash: `std::vector` is an array, not a linked-list, so your statement is not true. Just check the documentation of insert: http://www.cplusplus.com/reference/vector/vector/insert/ Quote: Linear on the number of elements inserted (copy/move construction) **plus the number of elements after position (moving).** – gexicide May 28 '14 at 08:41
  • @cegprakash: Sure, there is `std::list`, however, it is even worse than vector here! With `std::list`, you have constant time insert and delete if you have an `iterator` to the insert position, however, you cannot use binary search to get to that position and `retrieve(k)` is `O(k)` (start from the beginning and walk `k` links). – gexicide May 28 '14 at 08:46
  • I better go with insertion in O(n-k), retrieval in O(1) and searching in log(n) with std::vector then :) – cegprakash May 28 '14 at 08:53
  • @cegprakash: Yes, if your vector has less than around 10000 elements, you won't notice a big difference anyway, or vector might be even faster if you have a lot of retrievals and only few inserts/deletes. Things only start to matter once you hit higher numbers of elements. – gexicide May 28 '14 at 08:56
  • 3
    The structure this answer describes is commonly called an [order statistic tree](https://en.wikipedia.org/wiki/Order_statistic_tree). – Fred Foo May 28 '14 at 09:23
  • @cegprakash: I found an implementation, check the link at the bottom of my answer – gexicide May 28 '14 at 10:01
0

You can not achieve what you want with simple array or any other of the built-in containers. You can use a more advanced data structure for instance a skip list or a modified red-black tree(the backing datastructure of std::set).

You can get the k-th element of an arbitrary array in linear time and if the array is sorted you can do that in constant time, but still the insert will require shifting all the subsequent elements which is linear in the worst case.

As for std::set you will need additional data to be stored at each node to be able to get the k-th element efficiently and unfortunately you can not modify the node structure.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176