Min heap Vs. Sorting to find Kth smallest integer in an array

Question

I came across a question that was something like Given a million integers, return the kth smallest element. There was a given solution below the question, and I'm not sure why this is an optimal solution. The given solution involved using a min-heap. So initially, I figured this made sense, because we can find the smallest element in the heap in constant time. But after I thought for a second longer, I thought about the cost of inserting the elements from the array into the heap. If my understanding is correct, insertion is an O(logN) operation. But if we are inserting N elements, then this should cost us O(NlogN) time. We also are using extra space with the Heap that we're using. So my question is, why is this a better solution than just sorting the array, and taking the kth - 1 index?

[quick select](http://en.wikipedia.org/wiki/Quickselect) is the usual way to find the kth smallest element. An alternative is [intro select](http://en.wikipedia.org/wiki/Introselect) . — rcgldr, Mar 03 '17 at 05:37
@rcgldr: Thanks for the introselect link! I just read about them. — justhalf, Mar 03 '17 at 05:58
In the heap select case, the complexity is O(n log k), not O(n log n). Quick select is O(n), but with a high constant. Which will be faster in real life depends largely on how large k is in comparison to n. See [When theory meets practice](http://blog.mischel.com/2011/10/25/when-theory-meets-practice/) for a more detailed exploration of this. — Jim Mischel, Mar 03 '17 at 11:19

score 5 · Accepted Answer · answered Mar 03 '17 at 05:38

In min heap, a single insertion is O(logN) in the worst case, as that cost is only incurred if the heap property (that the parent value should be smaller than the children) is violated.

There is a theorem that says that if you want to convert an array into a heap (which is called building the heap), the overall complexity is O(N), not O(NlogN). You can read the details in Wikipedia for binary heap.

So this approach is indeed better than simply sorting the whole array. Since sorting the whole array has time complexity of O(NlogN), while using the min-heap approach the total complexity is O(N+klogN), which is more desirable when k is small.

For completeness, I'll describe why the complexity for taking the k-th smallest element using min-heap is O(N+klogN). If you want the smallest element, you can just take the root in constant time. But what about the second smallest element? You will then need to remove the smallest element, and then restore the heap property. Then the root will be the smallest element in the array in which the smallest element has been removed, so it is the second smallest element. We can keep doing this k times to get the k-th smallest element. Since the operation to restore the heap property is O(logN), the total complexity is O(N+klogN) for building the heap and then keep removing the smallest element k times.

There is another aproach where you don't even need to build the heap of all N elements, you only need a max-heap of k elements. Add k first elements to the heap. Then, iterate over remaining array elements and compare each with the root of the heap. if your element is smaller than the root, remove the root and add your element to the heap. At the end, the root will contain kth smallest element. Worst case will be O(k + Nlogk). Which one is more efficient depends on how big are N and k. — vladich, Mar 03 '17 at 06:24

score 2 · Answer 2 · edited May 23 '17 at 12:25

First let see what options do we have.

The first option is sort and return kth element. The time complexity is O(n log n) and space complexity is O(1) if you can sort the input array.

The second option is maintaing a heap of size k. It is explained in justhalf's answer. The time complexity is O(n log k) and space complexity is O(k) for the heap.

And actually you can heapify the array in O(n) time and then pop k elements out to find the kth element. The time complexity is O(n + k log n) again and space complexity is O(1).

And as pointed out by rcgldr's comment, there is a selection algorithm with O(n) time and O(1) space.

quick select is the usual way to find the kth smallest element. An alternative is intro select . – rcgldr

So it depends on the situation. For example when comparing solution using heap and solution using sort, you need to think what k would be. Sometimes k is very small and using heap may help.

Sometime the input is a stream and you need to report the kth element in real time, sorting the array using O(n log n) time or using selection algorithm in O(n) time maybe too slow. You can get the answer in O(1) time if you use heap.

You also need to consider about the space complexity. Sometime you may be not allow to change the input, so you need to copy the array if you use sort or selection algorithm. It means O(n) space comparing to O(k) space of heap solution.

So it all depends. Can you change the input array? How much memory can you spend? Is this a real time query and how frequent the query are? Does the input grow or is it static? The better solution will be different in different cases.

In my answer I used heap of size N. Maybe you are referring to vladich's comment on my answer. =) — justhalf, Mar 05 '17 at 18:51

score 1 · Answer 3 · answered Mar 03 '17 at 05:46

1

A better solution to this problem would be to use a max-heap of size K. We can traverse and compare each element of the array with the root of the max heap, if the array element is smaller than the root, we can replace the root with the element and heapify the heap. This operation would take log(K) time. We'll do this for all of the N elements of the array.

The root of the heap would give you the Kth smallest element.

This way the time complexity would come out to be O(NlogK).

answered Mar 03 '17 at 05:46

Sonakshi Mehra

23
5

It's not clear if O(N + klogN) is better than O(k + Nlogk). It depends on N and k and on constant factors. – vladich Mar 03 '17 at 06:29
@vladich I don't think the complexity would be `O(k + NlogK)` in the `max-heap` solution (Or would it? How?), it would only be `O(NlogK)`. Moreover I found this discussion : [mathematical proof](http://mathforum.org/library/drmath/view/70271.html). It seems as long as both `N` and `K` are larger than 2, the solution using `max-heap` would always be faster. – Sonakshi Mehra Mar 03 '17 at 07:33
k to build the initial heap of k elements, (N-k)log(k) to remove the root of the heap and add a new element to the heap N-k times (worst case). – vladich Mar 03 '17 at 07:38

score 1 · Answer 4 · edited May 23 '17 at 11:46

The question must have discussed the range of the integers.

For example say the range is [0,10000] means there are repetitions. Now if we use sorting, then we will have to sort a million integers! which will be O(nlogn)

In case of heap, as the repetitions are taken care by a counter ( or a linked list ) we only have a 10000 element in the heap! Generally speaking for n integers in range [0,m], the heap will take O(mlogm) time where m < n . Thus the heap would be efficient. One more thing, the BUILD_HEAP algorithm DOES NOT take O(mlogm) to build heap. It takes O(m) time.

When you are inserting kth element to heap the HEAPFY takes O(logk) where k < m. Read more about it here

And yes, there exists a O(n) algorithm to find the kth smallest element in the array. It is called selection algorithm. CLRS discusses it very elegantly. Read about it here and here

So it depends whether to use selection algorithm or heap according to your m and n

Min heap Vs. Sorting to find Kth smallest integer in an array

4 Answers4