5

This question is from an interview:

Find kth largest element in the union of 2 max heaps given that this kth element appears in both heaps in O(k log n).

This is the algorithm that I come up with:

While k is not zero
 if(root of max-heap #1 > root of max-heap #2)
 {
    extract-max(heap #1)
 }
 else if(root of max-heap #1 < root of max-heap #2)
 {
    extract-max(heap #2)
 }
 else //case of duplicates
 {
    extract-max(heap #1)
    extract-max(heap #2)
 }
 k--

So when k = 0, the extract-max function will extract the kth largest.

I think that this algorithm will run in O(k log n) time since the extract-max operation will run in O(log n) and I perform it k times.

Is this the right approach? It seems that I have not used the given info "this kth element appears in both heaps"? Is there a better approach utilizing this info?

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
LKT
  • 311
  • 1
  • 7
  • 17
  • 6
    Your approach and analysis seems fine to me – amit May 31 '14 at 22:30
  • 2
    Does *anyone* understand the point of "given that this kth element appears in both heaps"? – ooga May 31 '14 at 22:43
  • 2
    It could just be a way to simplify the problem slightly. Once you've popped off `k-1` elements, there's no need to examine the heads of both heaps. – Sneftel May 31 '14 at 22:44
  • I think your code is *almost* right. You probably don't want to take both elements in the case of duplicates. Make the first test `>=`, and get rid of the dangling `else`. Otherwise you have inconsistent behavior. If one heap has duplicate items, they're extracted one at a time. But if there are duplicate items across heaps, they're extracted in a single operation. That will lead to problems. Other than that, your approach is sound. Although I don't understand what "this kth element appears in both heaps" means. – Jim Mischel Jun 01 '14 at 04:13
  • Thanks guys for the comments. Yes, I quite don't get the extra info given in this problem too, that's why I posted the question. I think we might utilize it for a better approach than mine. @JimMischel: Let's say if there are a duplicate across two heaps. heap #1 hast root 6 and heap #2 has root 6. k = 2 (find second largest). If I just extract-max from heap #1, now heap #1 has root 5 (for example) and heap #2 still has root 6 and k = 1. Next extract-max will extract 6 as my second largest since k = 0 now although 5 should be the second largest in this situation. – LKT Jun 01 '14 at 06:50
  • Also I found a bug in my code and does not know how to fix. Let's say heap #1 has root 6, and another 6 duplicate in it. Heap #2 has root 6. k=2 (find second largest). According to my original code, I will extract-max both 6 in heap #1 and heap #2. After that, heap #1 will have root 6 and heap #2 will have let's say root 5 and now k = 1. Next I will extract-max heap #1 and k = 0 now and the code reports that my second largest is 6, but it should be 5 in this case. Does anybody know to fix it? I think we might need to use the additional given info that I have not used. Thanks! – LKT Jun 01 '14 at 06:55
  • @LKT This 'bug' can be easily solved with popping elements from the heap while the root is identical to the old root, every time you delete elements from a heap. – amit Jun 01 '14 at 07:45
  • @JimMischel The extraction of 2 elements, one from each heap is done because the question asks for k'th largest element in the **union** of heaps, usually the terminology for union means take at most one copy of each element. But since it is an interview question - asking for clarification from the interviewer can be a good idea, and will show him you have good understanding of the different possibilities. – amit Jun 01 '14 at 07:46
  • Also note: More efficient (asymptotically, not practically) solution would be to [find k'th biggest element in each heap in O(klogk)](http://stackoverflow.com/q/11209556/572670), create two arrays of size `k`, merge them to array of size `2k` in `O(k)` and find `k`th biggest element in a sorted array. – amit Jun 01 '14 at 07:52
  • @amit Could you expand on your comment, please? I failed to understand both your comment and the linked `O(k*log(k))` solution. Thanks! – Ali Jun 01 '14 at 10:41
  • @Ali You can use a heap of size O(k) to find the k largest elements in a max-heap of arbitrary size in O(k log k). In fact [it can even be done in O(k)](http://stackoverflow.com/questions/22574580/algorithm-for-finding-the-largest-k-numbers-in-a-max-heap-of-size-n-in-ok-time), which is somewhat surprising. You do that for both heaps. Now you have 2k unsorted elements and want to find the k-th largest among those. You can do the latter step by sorting in O(2k log (2k)) = O(k log k) and then retrieving the k-th element... – Niklas B. Jun 01 '14 at 11:38
  • ... You can also use the [median of medians selection algoirithm](http://en.wikipedia.org/wiki/Median_of_medians) to get the k-th element in O(k). So in total (@amit) it is in fact possible to solve the problem in O(k), but the heap selection due to Frederickson likely has galactic constant factors that make the algorithm totally impractical. – Niklas B. Jun 01 '14 at 11:38
  • @amit Yes, I can see that it solves the problem, but it will break the constraint O (k log n). For instance, an easy example to examine is heap #1 and heap #2 each has 5 nodes (4 of them are 7, 1 is 6 with 7 is being root of course since it's a max heap) - find the second largest (in this case it should return 6). By running my algorithm and your bug fix, I think the returned result is correct, but the big O is no longer k log n since we run the extract-max operation more than 2 times. – LKT Jun 02 '14 at 07:49
  • @LKT Handling with duplicates is impossible other way, assume a heap with `2n` values, `n` of them are '1', and all the others are greater than 1. How can you find the 2nd largest element, which is not 1 efficiently? You cannot. The 2nd largest element can be in any leaf node - it is undefined which one, and all non-leaf nodes are 1. Finding this value requires traversing all the leaves of the heap, which will be `O(n)`. – amit Jun 02 '14 at 12:29
  • @amit yup, that's what I thought also. The worst case is O (n log n). But I'm just wondering if we can utilize the info "given kth element appears in both heaps" to do something better? Maybe not...i guess :) – LKT Jun 02 '14 at 18:14

3 Answers3

1

The algorithm you’re describing here is very reasonable and probably extremely fast in practice. I wanted to add that, surprisingly, it’s possible to do this in time O(k) with no dependence on n at all.

This is a consequence of the following fact: there’s an algorithm that, given an n-element binary max heap and a number k, returns the k largest elements of the binary heap in time O(k). To the best of my knowledge there is no “simple” algorithm that does this. The first algorithm of this type was by Fredericksen and involves a very nuanced decomposition of the binary heap into small groups called “clans.” A more elegant recent paper by Kaplan et al uses the soft heap data structure to solve this problem, but it’s still a bit tricky.

Assuming you have this algorithm as a black box, here’s how to solve the problem in time O(k):

  1. Use the above algorithm to extract the k largest elements from the two max heaps in time O(k).
  2. Use a linear-time selection algorithm (say, median-of-medians) to select the kth largest element out of this group of 2k items in time O(k).

In practice I’d wager that this would be much, much slower than the algorithm you’ve proposed due to the high constant factors, but I wanted to mention this exists because it’s surprising that this can be done in linear time!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
0

Your approach is quite good and it can't be improved too much. It also has the expected complexity so you should be good to go. Only thing that I thought of and can be optimized is that you only need to check if k is zero, when you go in the last case:

While true
 if(root of max-heap #1 > root of max-heap #2)
 {
    extract-max(heap #1)
 }
 else if(root of max-heap #1 < root of max-heap #2)
 {
    extract-max(heap #2)
 }
 else //case of duplicates
 {
    extract-max(heap #1)
    extract-max(heap #2)
    if (k == 1) {
      break;
    }
 }
 k--

In this way you reduce the number of comparisons(which will have very slight effect) in the average case. This also takes advantage of the additional information that you're given - that the element is common for both heaps. However I don't think any benchmark will be able to notice this improvement as extract-max exceeds by far the complexity of the comparison.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176
0

You need to keep count of values appearing in both heaps properly:

// extract kth largest element from the union of max-heaps
//  heap #1 and heap #2, given to appear in both
forever
   if root of heap #1 > root of heap #2
      extract-max(heap #1)
   else if root of heap #1 < root of heap #2
      extract-max(heap #2)
   else // occurs in both heaps
      common = extract-max(heap #1)
      while common == root of heap #1
         if k < 2
            return
         extract-max(heap #1)
         k--
      while common == root of heap #2
         if k < 2
            return
         extract-max(heap #2)
         k--
   k--
greybeard
  • 2,249
  • 8
  • 30
  • 66