Finding xth smallest element in unsorted array

Question

I've been trying some coding algorithm exercises and one in particular topic has stood out to me. I've been trying to find out a good answer to this but I've been stuck in analysis paralysis. Let's say I have an array of unsorted integers and I want to determine the xth smallest element in this array.

I know of two options to go about this: Option 1: Run a sort algorithm, sorting elements least to greatest and look up the xth element. To my understanding, the time complexity to this is O(n*log(n)) and O(1) space.

Option 2: Heapify the array, turning it into a min heap. Then pop() the top of the heap x times. To my understanding, this is O(n) + O(x*log(n)).

I can't tell which is optimal answer and maybe I fundamental misunderstanding of priority queues and when to use them. I've tried to measure runtime and I feel like I'm getting conflicting results. Maybe since with option 2, it depends on how big x is. And maybe there is a better way to go algo. If someone could help, I'd appreciate it!

AFAIK quicksort is generally slightly faster than heapsort so I'd go with the first option. That way you also avoid having to extract x times. — Smich, Jun 07 '20 at 08:48
See also https://stackoverflow.com/questions/13185508/find-the-kth-smallest-element. I believe this demonstrates the fastest way to do this. — asky, Jun 07 '20 at 10:27
Does this answer your question? [How to find the kth largest element in an unsorted array of length n in O(n)?](https://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on) — Paul Hankin, Jun 07 '20 at 11:19

score 1 · Answer 1 · edited Jun 07 '20 at 12:10

1

Worst case time complexity of approach 2 should be O(n + n*log(n)), as maximum value of x = n.

For average case, time complexity = O(n + (1+2+3+....n)/n * log(n)) = O(n + (n+1)*log(n)).

Therefore approach 1 is more efficient than approach 2, but still not optimal.

PS: I would like you to have a look at quick select algorithm which works in O(n) on average case.

edited Jun 07 '20 at 12:10

asky

1,520
12
20

answered Jun 07 '20 at 08:47

Deepak Tatyaji Ahire

4,883
2
13
35

*"Therefore approach 1 is more efficient than approach 2"*: no. O(n + (n+1)*log(n)) is O(n*log(n)). – trincot Jun 07 '20 at 17:47
@trincot, Sir, on basis of number of computations – Deepak Tatyaji Ahire Jun 07 '20 at 17:59
Sure, but still O(n + (n+1)*log(n)) is O(n*log(n)). – trincot Jun 07 '20 at 18:53
@trincot, Sir, I request you to improve my answer – Deepak Tatyaji Ahire Jun 07 '20 at 19:19

user973266 · Answer 2 · 2020-06-07T10:36:19.490

Although approach 1 will have less time complexity, but both of these algorithms will use auxiliary space,space complexity of std::sort is O(n). Another way of doing this ,in constant is to do is via binary search. You can do binary search for the xth element . Let l be the smallest element of the array and r be the largest, then time complexity will be O((nlog(r-l)).

int ans=l-1;
while(l<=r){
    int mid=(l+r)/2;
    int cnt=0;
    for(int i=0;i<n;i++){
        if(a[i]<=mid)
            cnt++;
    }
    if(cnt<x){
        ans=mid;
        l=mid+1; 
    }
    else
        r=mid-1;

}

Now you can look for the smallest element larger than ans present in the array.
Time complexity-O(nlog(r-l))+O(n)(for the last step)
space complexity-O(1)

score 0 · Answer 3 · answered Jun 07 '20 at 10:10

This algorithms complexity can revolve around two data points:

Value of x.
Value of n.

Space complexity In both algos space complexity remains the O(1)

Time complexity

Approach 1

Best Case : O(nlog(n)) for sorting & O(1) for case x == 1; Average Case : O(nlog(n)) if we consider all elements are unique & O(x+nlog(n)) if there are duplicates. Worst Case. : O(n+nlog(n)) for case x==n;

Approach 2:

Best Case : O(n) as just heapify would be require case x==1 Average Case : O(n + xlog(n)) Worst Case. : O(n+nlog(n)) case x==n;

Now Coming to the point to analyze this algo's in runtime. In general below guidelines are to be followed.

 1. Always test for larger values of n.
 2. Have a good spread for values being tested(here x).
 3. Do multiple iterations of the analysis with clean environment
    (array created everytime before the experiment etc) & get the average of all 
    results.
 4. Check for the any predefined functions code complexity for exact implementation.
    In this case the sort(can be 2nlogn etc) & various heap operations code.

So if considered above all having idle values. Method 2 should perform better than Method 1.

score 0 · Answer 4 · answered Nov 30 '20 at 08:18

You can find xth element in O(n); there are also two simple heap algorithms that improve on your option 2 complexity. I'll start with the latter.

Simple heap algorithm №1: O(x + (n-x) log x) worst-case complexity

Create a max heap out of the first x elements; for each of the remaining elements, pop the max and push them instead:

import heapq
def findKthSmallest(nums: List[int], k: int) -> int:
    heap = [-n for n in nums[:k]]
    heapq.heapify(heap)
    for num in nums[k:]:
        if -num > heap[0]:
            heapq.heapreplace(heap, -num)
    return -heap[0]

Simple heap algorithm №2: O(n + x log x)

Turn the whole array into a min heap, and insert its root into an auxiliary min heap.
k-1 times pop an element from the second heap, and push back its children from the first heap.
Return the root of the second heap.

    import heapq
    def findKthSmallest(nums: List[int], k: int) -> int:
        x = nums.copy()
        heapq.heapify(x)
        s = [(x[0], 0)] #auxiliary heap
        for _ in range(k-1):
            ind = heapq.heappop(s)[1]
            if 2*ind+1 < len(x):
                heapq.heappush(s, (x[2*ind+1], 2*ind+1))
            if 2*ind+2 < len(x):
                heapq.heappush(s, (x[2*ind+2], 2*ind+2))
        return s[0][0]

Which of these is faster? It depends on values of x and n.

A more complicated Frederickson algorithm would allow you to find xth smallest element in a heap in O(x), but that would be overkill, since xth smallest element in unsorted array can be found in O(n) worst-case time.

Median-of-medians algorithm: O(n) worst-case time

Described in [1].

Quickselect algorithm: O(n) average time, O(n^2) worst-case time

def partition(A, lo, hi):
"""rearrange A[lo:hi+1] and return j such that
A[lo:j] <= pivot
A[j] == pivot
A[j+1:hi+1] >= pivot
"""
pivot = A[lo]
if A[hi] > pivot:
    A[lo], A[hi] = A[hi], A[lo]
#now A[hi] <= A[lo], and A[hi] and A[lo] need to be exchanged
i = lo
j = hi
while i < j:
    A[i], A[j] = A[j], A[i]
    i += 1
    while A[i] < pivot:
        i += 1
    j -= 1
    while A[j] > pivot:
        j -= 1
#now put pivot in the j-th place
if A[lo] == pivot:
    A[lo], A[j] = A[j], A[lo]
else:
    #then A[right] == pivot
    j += 1
    A[j], A[hi] = A[hi], A[j]
return j

def quickselect(A, left, right, k):
pivotIndex = partition(A, left, right)
if k == pivotIndex:
    return A[k]
elif k < pivotIndex:
    return quickselect(A, left, pivotIndex - 1, k)
else:
    return quickselect(A, pivotIndex + 1, right, k)

Introselect: O(n) worst-case time

Basically, do quickselect, but if recursion gets too deep, switch to median-of-medians.

import numpy as np
def findKthSmallest(nums: List[int], k: int) -> int:
    return np.partition(nums, k, kind='introselect')[k]

Rivest-Floyd algorithm: O(n) average time, O(n^2) worst-case time

Another way to speed up quickselect:

import math
C1 = 600
C2 = 0.5
C3 = 0.5
def rivest_floyd(A, left, right, k):
    assert k < len(A)
    while right > left:
        if right - left > C1:
            #select a random sample from A
            N = right - left + 1
            I = k - left + 1
            Z = math.log(N)
            S = C2 * math.exp(2/3 * Z) #sample size
            SD = C3 * math.sqrt(Z * S * (N - S) / N) * math.copysign(1, I - N/2)
            #select subsample such that kth element lies between newleft and newright most of the time
            newleft = max(left, k - int(I * S / N + SD))
            newright = min(right, k + int((N - I) * S / N + SD))
            rivest_floyd(A, newleft, newright, k)
        A[left], A[k] = A[k], A[left]
        j = partition2(A, left, right)
        if j <= k:
            left = j+1
        if k <= j:
            right = j-1
    return A[k]

[1]Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4., pp.220-223

Finding xth smallest element in unsorted array

4 Answers4