1

I've been trying some coding algorithm exercises and one in particular topic has stood out to me. I've been trying to find out a good answer to this but I've been stuck in analysis paralysis. Let's say I have an array of unsorted integers and I want to determine the xth smallest element in this array.

I know of two options to go about this: Option 1: Run a sort algorithm, sorting elements least to greatest and look up the xth element. To my understanding, the time complexity to this is O(n*log(n)) and O(1) space.

Option 2: Heapify the array, turning it into a min heap. Then pop() the top of the heap x times. To my understanding, this is O(n) + O(x*log(n)).

I can't tell which is optimal answer and maybe I fundamental misunderstanding of priority queues and when to use them. I've tried to measure runtime and I feel like I'm getting conflicting results. Maybe since with option 2, it depends on how big x is. And maybe there is a better way to go algo. If someone could help, I'd appreciate it!

Andrew Huh
  • 11
  • 1
  • AFAIK quicksort is generally slightly faster than heapsort so I'd go with the first option. That way you also avoid having to extract x times. – Smich Jun 07 '20 at 08:48
  • See also https://stackoverflow.com/questions/13185508/find-the-kth-smallest-element. I believe this demonstrates the fastest way to do this. – asky Jun 07 '20 at 10:27
  • 1
    Does this answer your question? [How to find the kth largest element in an unsorted array of length n in O(n)?](https://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on) – Paul Hankin Jun 07 '20 at 11:19

4 Answers4

1

Worst case time complexity of approach 2 should be O(n + n*log(n)), as maximum value of x = n.

For average case, time complexity = O(n + (1+2+3+....n)/n * log(n)) = O(n + (n+1)*log(n)).

Therefore approach 1 is more efficient than approach 2, but still not optimal.

PS: I would like you to have a look at quick select algorithm which works in O(n) on average case.

asky
  • 1,520
  • 12
  • 20
Deepak Tatyaji Ahire
  • 4,883
  • 2
  • 13
  • 35
0

Although approach 1 will have less time complexity, but both of these algorithms will use auxiliary space,space complexity of std::sort is O(n). Another way of doing this ,in constant is to do is via binary search. You can do binary search for the xth element . Let l be the smallest element of the array and r be the largest, then time complexity will be O((nlog(r-l)).

int ans=l-1;
while(l<=r){
    int mid=(l+r)/2;
    int cnt=0;
    for(int i=0;i<n;i++){
        if(a[i]<=mid)
            cnt++;
    }
    if(cnt<x){
        ans=mid;
        l=mid+1; 
    }
    else
        r=mid-1;

}

Now you can look for the smallest element larger than ans present in the array.
Time complexity-O(nlog(r-l))+O(n)(for the last step)
space complexity-O(1)

0

This algorithms complexity can revolve around two data points:

  • Value of x.

  • Value of n.

Space complexity In both algos space complexity remains the O(1)

Time complexity

  • Approach 1

Best Case : O(nlog(n)) for sorting & O(1) for case x == 1; Average Case : O(nlog(n)) if we consider all elements are unique & O(x+nlog(n)) if there are duplicates. Worst Case. : O(n+nlog(n)) for case x==n;

  • Approach 2:

Best Case : O(n) as just heapify would be require case x==1 Average Case : O(n + xlog(n)) Worst Case. : O(n+nlog(n)) case x==n;

Now Coming to the point to analyze this algo's in runtime. In general below guidelines are to be followed.

 1. Always test for larger values of n.
 2. Have a good spread for values being tested(here x).
 3. Do multiple iterations of the analysis with clean environment
    (array created everytime before the experiment etc) & get the average of all 
    results.
 4. Check for the any predefined functions code complexity for exact implementation.
    In this case the sort(can be 2nlogn etc) & various heap operations code.

So if considered above all having idle values. Method 2 should perform better than Method 1.

Vishal T
  • 85
  • 6
0

You can find xth element in O(n); there are also two simple heap algorithms that improve on your option 2 complexity. I'll start with the latter.

Simple heap algorithm №1: O(x + (n-x) log x) worst-case complexity

Create a max heap out of the first x elements; for each of the remaining elements, pop the max and push them instead:

import heapq
def findKthSmallest(nums: List[int], k: int) -> int:
    heap = [-n for n in nums[:k]]
    heapq.heapify(heap)
    for num in nums[k:]:
        if -num > heap[0]:
            heapq.heapreplace(heap, -num)
    return -heap[0]

Simple heap algorithm №2: O(n + x log x)

  1. Turn the whole array into a min heap, and insert its root into an auxiliary min heap.
  2. k-1 times pop an element from the second heap, and push back its children from the first heap.
  3. Return the root of the second heap.
    import heapq
    def findKthSmallest(nums: List[int], k: int) -> int:
        x = nums.copy()
        heapq.heapify(x)
        s = [(x[0], 0)] #auxiliary heap
        for _ in range(k-1):
            ind = heapq.heappop(s)[1]
            if 2*ind+1 < len(x):
                heapq.heappush(s, (x[2*ind+1], 2*ind+1))
            if 2*ind+2 < len(x):
                heapq.heappush(s, (x[2*ind+2], 2*ind+2))
        return s[0][0]

Which of these is faster? It depends on values of x and n.

A more complicated Frederickson algorithm would allow you to find xth smallest element in a heap in O(x), but that would be overkill, since xth smallest element in unsorted array can be found in O(n) worst-case time.

Median-of-medians algorithm: O(n) worst-case time

Described in [1].

Quickselect algorithm: O(n) average time, O(n^2) worst-case time

def partition(A, lo, hi):
"""rearrange A[lo:hi+1] and return j such that
A[lo:j] <= pivot
A[j] == pivot
A[j+1:hi+1] >= pivot
"""
pivot = A[lo]
if A[hi] > pivot:
    A[lo], A[hi] = A[hi], A[lo]
#now A[hi] <= A[lo], and A[hi] and A[lo] need to be exchanged
i = lo
j = hi
while i < j:
    A[i], A[j] = A[j], A[i]
    i += 1
    while A[i] < pivot:
        i += 1
    j -= 1
    while A[j] > pivot:
        j -= 1
#now put pivot in the j-th place
if A[lo] == pivot:
    A[lo], A[j] = A[j], A[lo]
else:
    #then A[right] == pivot
    j += 1
    A[j], A[hi] = A[hi], A[j]
return j

def quickselect(A, left, right, k):
pivotIndex = partition(A, left, right)
if k == pivotIndex:
    return A[k]
elif k < pivotIndex:
    return quickselect(A, left, pivotIndex - 1, k)
else:
    return quickselect(A, pivotIndex + 1, right, k) 

Introselect: O(n) worst-case time

Basically, do quickselect, but if recursion gets too deep, switch to median-of-medians.

import numpy as np
def findKthSmallest(nums: List[int], k: int) -> int:
    return np.partition(nums, k, kind='introselect')[k]

Rivest-Floyd algorithm: O(n) average time, O(n^2) worst-case time

Another way to speed up quickselect:

import math
C1 = 600
C2 = 0.5
C3 = 0.5
def rivest_floyd(A, left, right, k):
    assert k < len(A)
    while right > left:
        if right - left > C1:
            #select a random sample from A
            N = right - left + 1
            I = k - left + 1
            Z = math.log(N)
            S = C2 * math.exp(2/3 * Z) #sample size
            SD = C3 * math.sqrt(Z * S * (N - S) / N) * math.copysign(1, I - N/2)
            #select subsample such that kth element lies between newleft and newright most of the time
            newleft = max(left, k - int(I * S / N + SD))
            newright = min(right, k + int((N - I) * S / N + SD))
            rivest_floyd(A, newleft, newright, k)
        A[left], A[k] = A[k], A[left]
        j = partition2(A, left, right)
        if j <= k:
            left = j+1
        if k <= j:
            right = j-1
    return A[k]

[1]Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. ISBN 0-262-03384-4., pp.220-223

Viktoriya Malyasova
  • 1,343
  • 1
  • 11
  • 25