34

We have an n-node binary heap which contains n distinct items (smallest item at the root). For a k<=n, find a O(klogk) time algorithm to select kth smallest element from the heap.

O(klogn) is obvious, but couldn't figure out a O(klogk) one. Maybe we can use a second heap, not sure.

Saeid
  • 4,147
  • 7
  • 27
  • 43

2 Answers2

30

Well, your intuition was right that we need extra data structure to achieve O(klogk) because if we simply perform operations on the original heap, the term logn will remain in the resulting complexity.

Guessing from the targeted complexity O(klogk), I feel like creating and maintaining a heap of size k to help me achieve the goal. As you may be aware, building a heap of size k in top-down fashion takes O(klogk), which really reminds me of our goal.

The following is my try (not necessarily elegant or efficient) in an attempt to attain O(klogk):

  1. We create a new min heap, initializing its root to be the root of the original heap.

  2. We update the new min heap by deleting the current root and inserting the two children of the current root in the original heap. We repeat this process k times.

  3. The resulting heap will consist of k nodes, the root of which is the kth smallest element in the original heap.

Notes: Nodes in the new heap should store indexes of their corresponding nodes in the original heap, rather than the node values themselves. In each iteration of step 2, we really add a net of one more node into the new heap (one deleted, two inserted), k iterations of which will result in our new heap of size k. During the ith iteration, the node to be deleted is the ith smallest element in the original heap.

Time Complexity: in each iteration, it takes O(3logk) time to delete one element from and insert two into the new heap. After k iterations, it is O(3klogk) = O(klogk).

Hope this solution inspires you a bit.

Terry Li
  • 16,870
  • 30
  • 89
  • 134
  • 1
    This is basically @Kevin 's solution – amit Oct 05 '11 at 04:26
  • @Terry Li: Instead of creating a new min heap, if we create a new max heap of size k and keep on inserting the elements from original min heap to new max heap. When the max heap is full, its root will have the kth smallest element and that heap will have the smallest k elements. Is my thinking right ? – Vikas Mangal Mar 23 '15 at 20:46
  • Isn't deleting current root O(lgn) instead of O(lgk) because heapifying the original min heap is required after the deletion. So the overall complexity becomes kO(lgn) instead of kO(lgk)? – jiangok Nov 11 '15 at 05:58
  • 1
    @VikasMangal No, that wouldn't work in klogk. That would again be a klogn algorithm because you will have to heapify the original min heap k times. – sharmakeshav Feb 08 '17 at 03:37
  • 1
    @jiangok There's no need to modify the original heap. You just read elements from the original heap and then modify your newly created heap. The new heap will be of max size k, thus it'll take O(logk) to heapify it. – sharmakeshav Feb 08 '17 at 03:39
  • 1
    @TerryLi So, the new heap will be composed of pointers to the original heap nodes rather than the actual nodes? So, heapifying code for the new heap will be a little different? – user5054 Jan 28 '18 at 04:45
  • To find kth smallest element, isn't it should be a new max heap instead of a min heap? – Jet C. Dec 20 '21 at 00:39
  • After taking the current node's children to put in the auxiliary heap, this means that one of those two nodes must be the next minimum. But note that you have more nodes in the same level that we overpass and might even be more minimal. So we need to search the minimal node in the `i`-th level. So why we look only at the two children of a node when we have to look at nodes at the whole level and pick the minimal from there? – jacob12 Jun 18 '22 at 08:29
23

Assuming that we're using a minheap, so that a root node is always smaller than its children nodes.

Create a sorted list toVisit, which contains the nodes which we will traverse next. This is initially just the root node.
Create an array smallestNodes. Initially this is empty.
While length of smallestNodes < k:
    Remove the smallest Node from toVisit
    add that node to smallestNodes
    add that node's children to toVisit

When you're done, the kth smallest node is in smallestNodes[k-1].

Depending on the implementation of toVisit, you can get insertion in log(k) time and removal in constant time (since you're only removing the topmost node). That makes O(k*log(k)) total.

Kevin
  • 74,910
  • 12
  • 133
  • 166
  • Insertion isn't `log(k)`, but rather `log(n)`, where `n` is the number of nodes already in the heap. Inserting `k` nodes will be `k*log(n)`. – Jim Mischel Oct 04 '11 at 16:56
  • 2
    @JimMischel: no, in `toVisit` there are no more then `2k` nodes at any point [since we add 2 elements for each element we remove, and we do it `k` times], so the insertion and deletion from `toVisit` is `O(log2k) = O(logk)`. for each operation on the original list, we just extract the direct children of a specific node, which is `O(1)`. we overall do `k` times `O(logk)` ops, which is indeed `O(klogk)`. – amit Oct 04 '11 at 17:23
  • 5
    though a `sorted list` is not a good data structure for `toVisit`, since insertion is `O(k)` in this list. You will need a heap to actually obtain `O(klogk)` [skip list/ balanced BST/B+ tree are also valid options, though harder to implement, heap will be enough here]. – amit Oct 04 '11 at 19:00
  • @amit: Thank you. I misunderstood the description of the algorithm. – Jim Mischel Oct 04 '11 at 19:51
  • 1
    Instead of a sorted list, can't you just use a Queue and add to the queue smallest-largest children nodes to visit? – madCode Jun 13 '15 at 07:45
  • @madCode Consider the min-heap `[1, 2, 100, 3, 4, 101, 102]`. Find 4th smallest value (which will be the value 4). Using a queue and the smallest - largest children approach, we get `[1, 2, 100, 3, 4]`. This doesn't give us the right 4th smallest value. – Chaos Jun 20 '21 at 15:35