Queue data structure supporting fast k-th largest element finding

Question

I'm faced with a problem which requires a Queue data structure supporting fast k-th largest element finding.

The requirements of this data structure are as follows:

The elements in the queue are not necessarily integers, but they must be comparable to each other, i.e we can tell which one is greater when we compare two elements(they can be equal as well).
The data structure must support enqueue(adds the element at the tail) and dequeue(removes the element at the head).
It can quickly find the k-th largest element in the queue, pls note k is not a constant.
You can assume that operations enqueue , dequeue and k-th largest element finding all occur with the same frequency.

enter image description here

My idea is to use a modified balanced binary search tree. The tree is the same as ordinary balanced binary search tree except that every node_i is augmented with another field n_i, n_i denotes the number of nodes contained in the subtree with root node_i. The aforementioned operations are supported as follows:

For simplicity assume that all elements are distinct.

Enqueue(x): x is first inserted into the tree, suppose the corresponding node is node_t, we append pair(x,pointer to node_t) to the queue.

Dequeue: suppose (e1, node₁) is the element at the head, node₁ is the pointer into the tree corresponding to e1. We delete node₁ from the tree and remove (e1, node₁) from the queue.

K-th largest element finding: suppose root node is node_root, its two children are node_left and node_right(suppose they all exist), we compare K with n_root , three cases may happen:

if K< n_left we find the K-th largest element in the left subtree of n_root;
if K>n_root-n_right we find the (K-n_root+n_right)-th largest element in the right subtree of n_root;
otherwise n_root is the node we want.

The time complexity of all the three operations are O(log_N) , where N is the number of elements currently in the queue.

How can I speed up the operations mentioned above? With what data structures and how?

This seems like a contrived container (finding the kth element but then not removing it, all operations equal frequency) - can you provide more information about the underlying problem you're trying to solve? If you really need all these requirements I think you already found a solid solution. — Mark B, Sep 21 '12 at 14:22
@Mark B, I am processing a data stream, i want to know the current k-th largest element in the sliding window. Actually i want to find the top-k elements. — outlaw, Sep 21 '12 at 14:27
This might be a bit of a read but [boost::multiIndex](http://www.boost.org/doc/libs/1_51_0/libs/multi_index/doc/index.html) could help with this problem. — andre, Sep 21 '12 at 14:27
Regarding requirement 2, do the elements need to be accessible in the order in which they were added or can they be sorted? — smocking, Sep 21 '12 at 14:32
@smocking, enqueue and dequeue operations must be in order(FIFO), no other constraints imposed on element access. — outlaw, Sep 21 '12 at 14:35
@outlaw, Thx. Think I agree with Mark B then. If `k` were constant (which it's not), you could use a heap to keep track of the k-th element. That would allow O(1) for finding the k-th element and O(logN) for enqueue/dequeue. — smocking, Sep 21 '12 at 14:46
I think the solution of finding k-th element is right as this page also suggests: http://stackoverflow.com/questions/2329171/find-kth-smallest-element-in-a-binary-search-tree-in-optimum-way/2329236#2329236 — Heejin, Sep 21 '12 at 14:47
But I think you have to care about time complexity of tree rebuilding. If every enqueue and dequeue causes tree rebuilding, it will be pretty slow. — Heejin, Sep 21 '12 at 14:49
@smocking , yes you are right, when k is a constant the time complexity is as you said, i got it — outlaw, Sep 21 '12 at 14:54
@Heejin , yes my method of k-th finding is the same as the one you mentioned. I think tree rebuilding is not a problem here because the amortized complexity is still log(N). — outlaw, Sep 21 '12 at 14:56
@outlaw Yes, you are right. I just thought that there would be some way to maintain a lazy data structure, so delaying rebuilding the tree until `find k-th element` operation comes in. If it is possible, you may reduce a coefficient of O(logN). — Heejin, Sep 21 '12 at 15:06
It is worth to combine the binary search tree and the queue into a single structure. Just use one additional pointer for each node, which should point to the next node in FIFO order. — Evgeny Kluev, Sep 21 '12 at 15:07
@Heejin: You can delay rebuilding the tree, but expected number of new/deleted elements between two "find" operations is only O(1). — Evgeny Kluev, Sep 21 '12 at 15:14
@Evgeny Kluev Good idea! But when there exist duplicate elements this might not be possible, am i right? — outlaw, Sep 21 '12 at 15:15
There is no problem if we have not too many duplicates. Just implement the tree like std::multiset. If there are many duplicates, the original structure with separate queue is better; here a counter of duplicates for each node is better than additional pointer. — Evgeny Kluev, Sep 21 '12 at 15:20
@Evgeny Kluev good point, I will look into the detailed implementation of multiset . — outlaw, Sep 21 '12 at 15:30

amit · Accepted Answer · 2012-09-21T15:28:09.277

Note - you cannot achieve better then O(logn) for all, at best you need to "chose" which op you care for the most. (Otherwise, you could sort in O(n) by feeding the array to the DS, and querying 1st, 2nd, 3rd, ... nth elements)

Using a skip list instead of a Balanced BST as the sorted structure can reduce dequeue complexity to O(1) average case. It does not affect complexity of any other op.
To remove from a skip list - all you need to do is to get to the element using the pointer from the head of the queue, and follow the links up and remove each. The expected number of nodes needed to be deleted is 1 + 1/2 + 1/4 + ... = 2.
find Kth can be achieved in O(logK) by starting from the leftest node (and not the root) and making your way up until you find you have "more sons then needed", and then treat the just found node as the root just like the algorithm in the question. Though it is better in asymptotic complexity - the constant factor is double.

score 2 · Answer 2 · answered Sep 21 '12 at 15:29

I found an interesting paper:

Sliding-Window Top-k Queries on Uncertain Streams published in VLDB 2008 and cited by 71.

https://www.cse.ust.hk/~yike/wtopk.pdf

VLDB is the best conference in database research area, and the number of citations proves the data structure actually works.

The paper looks pretty difficult, but if you really need improve your data structure, I suggest you to read this paper or papers in the reference page of this paper.

score 1 · Answer 3 · answered Sep 27 '12 at 04:55

You can also use a finger tree.

For example, a priority queue can be implemented by labeling the internal nodes by the minimum priority of its children in the tree, or an indexed list/array can be implemented with a labeling of nodes by the count of the leaves in their children. Finger trees can provide amortized O(1) cons, reversing, cdr, O(log n) append and split; and can be adapted to be indexed or ordered sequences.

Also note that being a purely functional structure makes this a good choice for concurrent usage.

Queue data structure supporting fast k-th largest element finding

3 Answers3