16

I'm trying to come up with something to solve the following:

Given a max-heap represented as an array, return the kth largest element without modifying the heap. I was asked to do it in linear time, but was told it can be done in log time.

I thought of a solution:

Use a second max-heap and fill it with k or k+1 values into it (breadth first traversal into the original one) then pop k elements and get the desired one. I suppose this should be O(N+logN) = O(N)

Is there a better solution, perhaps in O(logN) time?

Alstor
  • 161
  • 1
  • 1
  • 4
  • got it, thanks, but in that case I still think your algorithm is incorrect because a breadth first search of the tree won't work right? – sunny Jul 16 '15 at 14:12
  • I suppose it should work. I used the term "search" incorrectly, basically I'm just searching for a traversal that stores the nodes of a level and then proceeds with the next level. I'll edit the term to clear out any potential ambiguity – Alstor Jul 16 '15 at 14:16
  • 1
    I think Fibonacci heaps are the way to an amortized O(log n) solution, but I like this question. I'm going to think about it... – erip Jul 16 '15 at 14:17
  • 1
    @Alstor I think Your solution is not right, because kth largest element need not be in kth level of the tree. – Sumeet Jul 16 '15 at 15:11
  • @Alstor If you are just going to traverse the tree and use a stack then why use a second max-heap because traversing will not modify the heap? – Sumeet Jul 16 '15 at 15:12
  • Hint: you have to use the heap operations. "breadth first traversal" isn't meaningful for a heap. – Colonel Panic Jul 17 '15 at 08:51
  • @ColonelPanic breadth first traversal is meaningful on all tree structures – BeyelerStudios Jul 17 '15 at 14:55

4 Answers4

9

The max-heap can have many ways, a better case is a complete sorted array, and in other extremely case, the heap can have a total asymmetric structure.

Here can see this: enter image description here

In the first case, the kth lagest element is in the kth position, you can compute in O(1) with a array representation of heap. But, in generally, you'll need to check between (k, 2k) elements, and sort them (or partial sort with another heap). As far as I know, it's O(K·log(k))

And the algorithm:

Input:
    Integer kth <- 8
    Heap heap <- {19,18,10,17,14,9,4,16,15,13,12}

BEGIN
    Heap positionHeap <- Heap with comparation: ((n0,n1)->compare(heap[n1], heap[n0]))

    Integer childPosition
    Integer candidatePosition <- 0
    Integer count <- 0
    positionHeap.push(candidate)
    WHILE (count < kth) DO
        candidatePosition <- positionHeap.pop();
        childPosition <- candidatePosition * 2 + 1
        IF (childPosition < size(heap)) THEN
            positionHeap.push(childPosition)
            childPosition <- childPosition + 1
            IF (childPosition < size(heap)) THEN
                positionHeap.push(childPosition)
            END-IF
        END-IF
        count <- count + 1
    END-WHILE
    print heap[candidate]
END-BEGIN

EDITED

I found "Optimal Algorithm of Selection in a min-heap" by Frederickson here: ftp://paranoidbits.com/ebooks/An%20Optimal%20Algorithm%20for%20Selection%20in%20a%20Min-Heap.pdf

David Pérez Cabrera
  • 4,960
  • 2
  • 23
  • 37
  • Can you site reference for statement "in generally, you'll need to check between (k, 2k) elements"? For example in [this](https://s19.postimg.org/uj5i46l6b/heap1.png) image 7th largest element 9 is at index 1, whereas in [this](https://s19.postimg.org/ejrfyfbyb/heap1.png) image 4th largest element 12 is at index 14. So I guess this statement is wrong. – MsA Mar 17 '18 at 15:43
  • @anir In your first example, how do you know what is the 7th largest element? To determine which is the seventh largest element you have had to evaluate 9 elements. So 7 <= 9 < 14. And In your second example, To determine which is the fourth largest element you have had to evaluate 7 elements, so 4 <= 7 < 8. Why do you say the sentence is wrong? – David Pérez Cabrera Mar 17 '18 at 21:50
  • I thought, by "check between (k, 2k) elements, and sort them", you mean sort elements between heap array indices k and 2k. Now I feel, thats not what you meant to say right? I was saying that it was wrong because, in first example, 9 is a 7th largest element, but occurs at index 1. So we cannot determine 9 as 7th largest element by sorting elements between indices 7 and 14 (& then select the smallest from them). Similarly, in second example, fourth largest element, 12, occurs at index 14. So we cannot determine 12 as fourth largest by sorting elements between indices 4 and 8. (continued...) – MsA Mar 18 '18 at 05:02
  • (continued...) May be example could have avoided the confusion. Now I feel you were talking about the same procedure as explained in these answers: [1](https://stackoverflow.com/a/11210932/1317018), [2](https://stackoverflow.com/a/7655181/1317018) and [3](https://stackoverflow.com/a/7651139/1317018) and not straightway sorting heap array elements between indices k and 2k (and then selecting smallest out of sorting output) , right? – MsA Mar 18 '18 at 05:02
5

No, there's no O(log n)-time algorithm, by a simple cell probe lower bound. Suppose that k is a power of two (without loss of generality) and that the heap looks like (min-heap incoming because it's easier to label, but there's no real difference)

      1
   2     3
  4 5   6 7
.............
permutation of [k, 2k).

In the worst case, we have to read the entire permutation, because there are no order relations imposed by the heap, and as long as k is not found, it could be in any location not yet examined. This takes time Omega(k), matching the (complicated!) algorithm posted by templatetypedef.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
  • the title says Kth largest in max heap (not an arbitrary array). so there is no question of min heap coming. the worst case scenario you outlined is not applicable as it is max heap, so the parent node is always higher than its child. – Kishore Jul 25 '17 at 15:43
  • 1
    @Kishore The point of my answer is that the heap structure does not impose much structure in the worst case. – David Eisenstat Jul 25 '17 at 18:52
2

To the best of my knowledge, there's no easy algorithm for solving this problem. The best algorithm I know of is due to Frederickson and it isn't easy. You can check out the paper here, but it might be behind a paywall. It runs in time O(k) and this is claimed to be the best possible time, so I suspect that a log-time solution doesn't exist.

If I find a better algorithm than this, I'll be sure to let you know.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
2

Max-heap in an array: element at i is larger than elements at 2*i+1 and 2*i+2 (i is 0-based)

You'll need another max heap (insert, pop, empty) with element pairs (value, index) sorted by value. Pseudocode (without boundary checks):

input: k
1. insert (at(0), 0)
2. (v, i) <- pop and k <- k - 1
3. if k == 0 return v
4. insert (at(2*i+1), 2*i+1) and insert (at(2*+2), 2*+2)
5. goto 2

Runtime evaluation

  • array access at(i): O(1)
  • insertion into heap: O(log n)
  • insert at 4. takes at most log(k) since the size of heap of pairs is at most k + 1
  • statement 3. is reached at most k times
  • total runtime: O(k log k)
BeyelerStudios
  • 4,243
  • 19
  • 38