Print k largest elements in a max heap sized n, in klog(k) complexity

Question

I tried writing an algorithm that prints the k largest elemtns of a max heap but I cannot do it in the right complexity.

This is the Pseudo-code I wrote-

Print_k_largest(A[1,…,n],k):

If k>Heapsize(A):Error
i=1
insert[B,A[i])
print(B[1])
k-=1

While k>0:
    if 2*i< Heapsize(A):
        Insert(B,A[2*i])
        Insert(B,A[2*i+1])

    elif 2*i= Heapsize(A):
       Insert(B,A[2*i])

    B[1]=B[Heapsize(B)]
    Heapsize(B)-=1
    Max-Heapify(B,1)
    print(B[1])
    i=Binary_search(A[1,…,n],B[1])      
    k-=1

enter image description here

In this solution I create a new max heap based on the original one, so that its size is always smaller than K hence the complexity of max-heapify and other such functions is O(klogk) and not O(klogN) as I was requested to do. This Pseudocode is based on the solution suggested here.

The idea is like this- because it's a max heap the largest element is the root, the second largest element is one of the root's son, the third one is either the other son or the sons of the current largest one and so on. In each iteration I insert the sons of the former largest (the one I printed before), remove the former largest, Max-heapify (to make the heap a max heap again, hence the root is the newest largest) and print the newest largest (newest root). The principle in this brilliant solution (unfortuantely not mine haha) is to do all the changes on a second heap whose size is always smaller than K (because in each of the k iterations we add maximum 2 new elements and remove one) so that the runtime for actions like max-heapify is O(logk) and not O(logn). The thing is that to add the sons of the current largest I need an acess to its location (index) on the original tree! I don't know how to do it without it costing logn and runing everything.

I would appreaciate any help.

I don't know what `insert[B,A[i])` is pseudocode for (I'm pretty sure there's a mismatched brace, but even so - is that insert `A[i]` into `B`? Also, do you have access to the implementation of the original heap? If not, what's the exact interface you can access? I feel like I want to see more of the implementation of the original heap. — Edward Peters, Nov 18 '22 at 15:48

score 1 · Answer 1 · answered Nov 18 '22 at 15:54

(This might be exactly the algorithm you're already trying to implement, if so hope this phrasing makes it clearer)

So I would create a second heap, but it would not just be a heap of values - it would be a heap of positions in the original heap, inserted by the value at that position.

Call the original heap A and the new heap B. Start by inserting the head of A into B. Then, repeatedly pop the head of B, and insert the children (in A) into B.

So if A is build out of nodes like:

Node(value : int, left : Option[Node], right : Option[Node])

Ordered by value

Then B will be build out of meta-Nodes like:

MetaNode(value : Node, left : Option[MetaNode], right : Option[MetaNode])

Ordered by value.value

Initialize B with MetaNode(A.head) as its only element

Then repeatedly do:

for i in range 0..k-1:
    current = B.pop.value
    B.push (current.left) //might be None, should be coded so this is a no-op
    B.push (current.right) //see above
    results.add(current.value)

That isn't the simplest algorithm I've ever described, so if anything is unclear, please ask and I'll try to describe it more clearly. :)

score 1 · Answer 2 · answered Nov 18 '22 at 15:59

I have no idea why you think you need to binary search. The point of a heap is to not need to do that.

Remember that the key operations in a heap are as follows:

Append - add an element to the end.
Swap - exchange two elements.
SiftDown - Take an element and have it "fall down" to its place. That is, while it is not the root and is bigger than its parent, you Swap it with its parent and keep going. (Note, in the pointer version, you compare not by comparing the pointers, but by comparing the values that they point to. The principle is the same but you may need to rewrite the heap code to do it.)
HeapInsert - You Append then SiftDown.
SiftUp - Opposite of SiftDown. While the element has children and is smaller than at least one, you swap it with the larger of its children and keep going.
HeapPop - You Swap the first and last elements, remove the last to a temporary variable, SiftUp the first element, then return what used to be the first element.

With these operations your pseudocode becomes:

Print_k_largest(A[1,…,n],k):

If k>Heapsize(A):Error
i=1
HeapInsert[B,pointer to A[i]])

While k>0:
    if 2*i< Heapsize(A):
        HeapInsert(B,pointer to A[2*i])
        HeapInsert(B,pointer to A[2*i+1])

    elif 2*i= Heapsize(A):
       HeapInsert(B,pointer to A[2*i])

    PtrToElement = HeapPop(B)    
    print(PtrToElement.value)      
    k-=1

If you work in a language with native pointers, like C, you can construct pointers very easily using & (aka "address of") and * (aka "thing pointed to by) operators. If you work in a language without pointers, you'll need to store indexes into A like i and j, and then get values back using A[i] and A[j]. Which means that A has to become an argument to every operation in your heap of "pointers" because the language doesn't support actual pointers.

Either way the odds are good that you'll actually need to implement your own "heap of pointers".

In Python I cheat. Rather than use integers in the heap, I'll use tuples. So I wind up storing (A[i], i). Thus I have both value and index stored together, and compare by value first. If your language has dynamic typing and tuples, this can be more convenient than rewriting a heap implementation. But I doubt that the language you're working with will support this trick.

Thank you for your response, I really appreaciate it. Currently I don't work with any language because it's a theoretical DAST course (But I know python). I just implented the touple trick (so that B is a touple tree) after asking my course coordinates if it's valid to them (and they replied it is). Regarding the solution with pointers only- I thought about it before but then you cannot Max-heapify on B (as it contains only indexes) so I am not sure how such solution can work. Now I think the pseudocode is valid but now I have to formally prove it's correctness, goodluck to me haha. — DR_2001, Nov 18 '22 at 16:13
@DR_2001 Good luck! With the pointer solution, you would literally just do comparisons like `*B[i] < *B[j]`. To get indexes you just subtract off the start of the array, do the index math, and wind up with a new pointer. — btilly, Nov 18 '22 at 19:03

score 0 · Answer 3 · answered Nov 30 '22 at 06:45

You have to create a temporary Max binary heap and delete the node (k - 1) time and do the downheap on the heap every time you delete. After this, you can simply print the node which is the Kth largest element of the heap. Time Complexity - O(K * LogK)

Print k largest elements in a max heap sized n, in klog(k) complexity

3 Answers3