234

I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
MrDatabase
  • 43,245
  • 41
  • 111
  • 153
  • 54
    By the way, pretty much every algorithm described here turns into O(n^2) or O(n log n) when k==n. That is, I don't think a single one of them is O(n) for all values of k. I got modded down for pointing this out but thought you should know anyway. – Kirk Strauser Nov 04 '08 at 22:09
  • 22
    Selection algorithms can be O(n) for any fixed value of k. That is, you can have a selection algorithm for k=25 that is O(n) for any value of n, and you can do this for any particular value of k that is unrelated to n. The case in which the algorithm is no longer O(n) is when the value of k has some dependency on the value of n, such as k=n or k=n/2. This doesn't, however, mean that if you happen to run the k=25 algorithm on a list of 25 items that it is suddenly no longer O(n) because the O-notation describes a property of the algorithm, not a particular run of it. – Tyler McHenry Jul 31 '09 at 16:58
  • 1
    I was asked this question in an amazon interview as a general case of finding the second greatest element. By the way the interviewer lead the interview I didn't ask if I could destroy the original array (i.e. sorting it), so I came up with a complicated solution. – Sambatyon May 09 '11 at 17:43
  • 4
    This is Question 9 in Column 11 (Sorting) of Programming Pearls by Jon Bentley. – Qiang Xu Sep 13 '12 at 20:08
  • @Sambatyon, could you please share your complicated solution? It seems pretty easy to me, which needs to take 2 variables, which will hold `max` and `second max`, and traverse the array once, and get both in `O(n)`. Am I missing something? – Hengameh Jul 24 '15 at 10:34
  • 1
    Yeah, the second greatest element is is easy, but the k-th element is not. I tried to use a heap, but it is easier with a binary tree (better if its balanced) where you eliminate the largest element every time the size of the tree is greater than k. – Sambatyon Jul 24 '15 at 14:51
  • 3
    @KirkStrauser : If k==n or k==n-1 then it becomes trivial. We can get max or 2nd max in single traversal. So algorithms provided here will be practically used for values of k which don't belong to {1,2, n-1, n} – Aditya Joshee May 05 '16 at 10:28
  • 1
    @KirkStrauser: not true at all. I don't get how that comment got 32 upvotes. Jesus. – Karoly Horvath Jun 16 '16 at 10:01
  • @KirkStrauser startlingly, introselect's median of medians is guaranteed to remove 10% of the tree per iteration, thus is O(10*N) == O(N). I was surprised myself... – rogerdpack Jul 25 '16 at 19:16
  • What is it `kth`? – Rami Chasygov Jan 26 '18 at 13:37
  • O(N) solution : https://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on/32394237#32394237 – akhil_mittal Nov 23 '20 at 03:56

32 Answers32

176

This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.

Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):

Select(A,n,i):
    Divide input into ⌈n/5⌉ groups of size 5.

    /* Partition on median-of-medians */
    medians = array of each group’s median.
    pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
    Left Array L and Right Array G = partition(A, pivot)

    /* Find ith element in L, pivot, or G */
    k = |L| + 1
    If i = k, return pivot
    If i < k, return Select(L, k-1, i)
    If i > k, return Select(G, n-k, i-k)

It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
eladv
  • 2,001
  • 1
  • 12
  • 3
122

If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)

The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.

Here is the algorithm.

QuickSelect(A, k)
  let r be chosen uniformly at random in the range 1 to length(A)
  let pivot = A[r]
  let A1, A2 be new arrays
  # split into a pile A1 of small elements and A2 of big elements
  for i = 1 to n
    if A[i] < pivot then
      append A[i] to A1
    else if A[i] > pivot then
      append A[i] to A2
    else
      # do nothing
  end for
  if k <= length(A1):
    # it's in the pile of small elements
    return QuickSelect(A1, k)
  else if k > length(A) - length(A2)
    # it's in the pile of big elements
    return QuickSelect(A2, k - (length(A) - length(A2))
  else
    # it's equal to the pivot
    return pivot

What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of

T(n) = Theta(n) + T(n-1) = Theta(n2)

But if the choices are indeed random, the expected running time is given by

T(n) <= Theta(n) + (1/n) ∑i=1 to nT(max(i, n-i-1))

where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.

Let's guess that T(n) <= an for some a. Then we get

T(n) 
 <= cn + (1/n) ∑i=1 to nT(max(i-1, n-i))
 = cn + (1/n) ∑i=1 to floor(n/2) T(n-i) + (1/n) ∑i=floor(n/2)+1 to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n T(i)
 <= cn + 2 (1/n) ∑i=floor(n/2) to n ai

and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:

i=floor(n/2) to n i  
 = ∑i=1 to n i - ∑i=1 to floor(n/2) i  
 = n(n+1)/2 - floor(n/2)(floor(n/2)+1)/2  
 <= n2/2 - (n/4)2/2  
 = (15/32)n2

where we take advantage of n being "sufficiently large" to replace the ugly floor(n/2) factors with the much cleaner (and smaller) n/4. Now we can continue with

cn + 2 (1/n) ∑i=floor(n/2) to n ai,
 <= cn + (2a/n) (15/32) n2
 = n (c + (15/16)a)
 <= an

provided a > 16c.

This gives T(n) = O(n). It's clearly Omega(n), so we get T(n) = Theta(n).

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
Ying Xiao
  • 1,729
  • 1
  • 9
  • 5
  • 12
    Quickselect is only O(n) in the average case. The median-of-medians algorithm can be used to solve the problem in O(n) time in the worst case. – John Kurlak Jan 19 '13 at 05:01
  • What's the meaning of `k > length(A) - length(A2)` ? – WoooHaaaa Oct 24 '13 at 10:57
  • this isn't O(n), you are calling the function again as recursive, T(n). There is already an O(n) there inside the recursive function T(n), so obviously without thinking, overall complexity would be greater than O(n). – user1735921 Jul 30 '14 at 17:25
  • 3
    @MrROY Given that we splitted `A` into `A1` and `A2` around the pivot, we know that `length(A) == length(A1)+length(A2)+1`. So, `k > length(A)-length(A2)` is equivalent to `k > length(A1)+1`, which is true when `k` is somewhere in `A2`. – Filipe Gonçalves Aug 17 '14 at 19:28
  • @FilipeGonçalves, yes if there are no duplicate elements in pivot. len(A1) + len(A2) + K-duplicate = len(A) – d1val Nov 02 '14 at 07:21
  • "worst-case expected" is an oxymoron. – codetaku Oct 20 '15 at 18:55
  • Note that without randomizing, the worst case is O(n^2). Even with randomizing it still is, but obviously "very rare" for it to degrade to that (in which case you'd need to do the median of medians thing). According to wikipedia, typically in practice this method preferred to median-of-medians in terms of overall performance/speed – rogerdpack Jul 25 '16 at 17:41
14

A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

"Make one pass through tracking the three largest values so far." 

(it was specifically for 3d largest)

and this answer:

Build a heap/priority queue.  O(n)
Pop top element.  O(log n)
Pop top element.  O(log n)
Pop top element.  O(log n)

Total = O(n) + 3 O(log n) = O(n)
rogerdpack
  • 62,887
  • 36
  • 269
  • 388
warren
  • 32,620
  • 21
  • 85
  • 124
  • 17
    well, its actually O(n)+ O( k log n) which doesn't reduce for significant values of K – Jimmy Oct 30 '08 at 21:19
  • also note: I quoted the site :) – warren Oct 30 '08 at 21:22
  • Tracking can be done with a doubly linked list that you keep at fixed length. The last item should then be the kth largest element. Insertion at the end and removal at the back are both O(1), lookup at the back is O(1) too. – Jasper Bekkers Oct 30 '08 at 21:23
  • 2
    But finding the insertion point in that doubly-linked list is O(k). – Kirk Strauser Oct 30 '08 at 23:25
  • 1
    And if k is fixed, O(k) = O(1) – Tyler McHenry Jul 31 '09 at 17:00
  • 1
    @warren: Big-O is approximating, but you always over-approximate. Quicksort is actually O(n^2), for example, since that is the worst case. this one is O(n + k log n). – Claudiu Apr 18 '10 at 18:26
  • 1
    you cannot treat k as constant. It is possible that k = n in which case the time complexity is O(nlogn) – sabbir Apr 17 '18 at 23:28
  • 1
    (@sabbir: `k = n -> O(nlogn)` nah - the *n*th largest element is the smallest, "obviously" O(*n*). The hardest is - surprise - the median: log*n* + log*n-1* + … + log*n/2*: O(*n*log*n*).) – greybeard Sep 18 '22 at 10:25
11

You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).

stinky
  • 111
  • 1
  • 2
6

A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.

I answered the letter of your question :)

Jimmy
  • 89,068
  • 17
  • 119
  • 137
  • 3
    Not really true in all cases. I have implemented median-of-medians and compared it to built-in Sort method in .NET and custom solution really ran faster by order of magnitude. Now the real question is: does that matter to you in given circumstances. Writing and debugging 100 lines of code compared to one liner pays off only if that code is going to be executed so many times that user starts noticing the difference in running time and feel discomfort waiting for the operation to complete. – Zoran Horvat Jul 19 '13 at 09:14
5

The C++ standard library has almost exactly that function call nth_element, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.

const int N = ...;
double a[N];
// ... 
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a
rogerdpack
  • 62,887
  • 36
  • 269
  • 388
David Nehme
  • 21,379
  • 8
  • 78
  • 117
  • 1
    No, it has an expected *average* O(n) runtime. For example, quicksort is O(nlogn) on average with a worst case of O(n^2). Wow, something straight up factually wrong! – Kirk Strauser Oct 30 '08 at 23:30
  • 5
    No, there's nothing factually wrong with this answer. It works and the C++ standard requires an expected linear run time. – David Nehme Oct 31 '08 at 00:21
  • I was asked in interview to assume space availability of O(k) and 'n' is very huge. I couldn't tell him O(n) solution as I thought nth_element would need space o(n). Am i wrong ?isn't underlying algorithm is quicksort based for nth_element ? – Manish Baphna Sep 13 '11 at 17:27
4

I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.

Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.

4

Although not very sure about O(n) complexity, but it will be sure to be between O(n) and nLog(n). Also sure to be closer to O(n) than nLog(n). Function is written in Java

public int quickSelect(ArrayList<Integer>list, int nthSmallest){
    //Choose random number in range of 0 to array length
    Random random =  new Random();
    //This will give random number which is not greater than length - 1
    int pivotIndex = random.nextInt(list.size() - 1); 

    int pivot = list.get(pivotIndex);

    ArrayList<Integer> smallerNumberList = new ArrayList<Integer>();
    ArrayList<Integer> greaterNumberList = new ArrayList<Integer>();

    //Split list into two. 
    //Value smaller than pivot should go to smallerNumberList
    //Value greater than pivot should go to greaterNumberList
    //Do nothing for value which is equal to pivot
    for(int i=0; i<list.size(); i++){
        if(list.get(i)<pivot){
            smallerNumberList.add(list.get(i));
        }
        else if(list.get(i)>pivot){
            greaterNumberList.add(list.get(i));
        }
        else{
            //Do nothing
        }
    }

    //If smallerNumberList size is greater than nthSmallest value, nthSmallest number must be in this list 
    if(nthSmallest < smallerNumberList.size()){
        return quickSelect(smallerNumberList, nthSmallest);
    }
    //If nthSmallest is greater than [ list.size() - greaterNumberList.size() ], nthSmallest number must be in this list
    //The step is bit tricky. If confusing, please see the above loop once again for clarification.
    else if(nthSmallest > (list.size() - greaterNumberList.size())){
        //nthSmallest will have to be changed here. [ list.size() - greaterNumberList.size() ] elements are already in 
        //smallerNumberList
        nthSmallest = nthSmallest - (list.size() - greaterNumberList.size());
        return quickSelect(greaterNumberList,nthSmallest);
    }
    else{
        return pivot;
    }
}
3

Sexy quickselect in Python

def quickselect(arr, k):
    '''
     k = 1 returns first element in ascending order.
     can be easily modified to return first element in descending order
    '''

    r = random.randrange(0, len(arr))

    a1 = [i for i in arr if i < arr[r]] '''partition'''
    a2 = [i for i in arr if i > arr[r]]

    if k <= len(a1):
        return quickselect(a1, k)
    elif k > len(arr)-len(a2):
        return quickselect(a2, k - (len(arr) - len(a2)))
    else:
        return arr[r]
hoder
  • 257
  • 1
  • 3
  • 14
  • Nice solution, except that this returns the kth _smallest_ element in an unsorted list. Reversing the comparison operators in the list comprehensions, `a1 = [i for i in arr if i > arr[r]]` and `a2 = [i for i in arr if i < arr[r]]`, will return the kth _largest_ element. – gumption Apr 22 '15 at 22:03
  • From a small benchmark, even on large arrays, it is faster to sort (with `numpy.sort` for `numpy array` or `sorted` for lists) than to use this manual implementation. – Næreen Jun 29 '17 at 11:43
3

You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.

For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.

Warren's priority heap solution is neater though.

Rob Walker
  • 46,588
  • 15
  • 99
  • 136
  • 3
    This would have a worst case of O(n^2) where you're asked for the smallest item. – Elie Oct 30 '08 at 21:23
  • 2
    "Smallest item" means that k=n, so k is no longer constant. – Tyler McHenry Jul 31 '09 at 17:01
  • Or maybe keep a heap (or reversed heap, or balanced tree) of the largest k you've seen so far `O(n log k)`...still degenerates to O(nlogn) in case of large k. I'd think it would work fine for small values of k however...possibly faster than some of the other algorithms mentioned here [???] – rogerdpack Jul 25 '16 at 20:24
3

As per this paper Finding the Kth largest item in a list of n items the following algorithm will take O(n) time in worst case.

  1. Divide the array in to n/5 lists of 5 elements each.
  2. Find the median in each sub array of 5 elements.
  3. Recursively find the median of all the medians, lets call it M
  4. Partition the array in to two sub array 1st sub-array contains the elements larger than M , lets say this sub-array is a1 , while other sub-array contains the elements smaller then M., lets call this sub-array a2.
  5. If k <= |a1|, return selection (a1,k).
  6. If k− 1 = |a1|, return M.
  7. If k> |a1| + 1, return selection(a2,k −a1 − 1).

Analysis: As suggested in the original paper:

We use the median to partition the list into two halves(the first half, if k <= n/2 , and the second half otherwise). This algorithm takes time cn at the first level of recursion for some constant c, cn/2 at the next level (since we recurse in a list of size n/2), cn/4 at the third level, and so on. The total time taken is cn + cn/2 + cn/4 + .... = 2cn = o(n).

Why partition size is taken 5 and not 3?

As mentioned in original paper:

Dividing the list by 5 assures a worst-case split of 70 − 30. Atleast half of the medians greater than the median-of-medians, hence atleast half of the n/5 blocks have atleast 3 elements and this gives a 3n/10 split, which means the other partition is 7n/10 in worst case. That gives T(n) = T(n/5)+T(7n/10)+O(n). Since n/5+7n/10 < 1, the worst-case running time isO(n).

Now I have tried to implement the above algorithm as:

public static int findKthLargestUsingMedian(Integer[] array, int k) {
        // Step 1: Divide the list into n/5 lists of 5 element each.
        int noOfRequiredLists = (int) Math.ceil(array.length / 5.0);
        // Step 2: Find pivotal element aka median of medians.
        int medianOfMedian =  findMedianOfMedians(array, noOfRequiredLists);
        //Now we need two lists split using medianOfMedian as pivot. All elements in list listOne will be grater than medianOfMedian and listTwo will have elements lesser than medianOfMedian.
        List<Integer> listWithGreaterNumbers = new ArrayList<>(); // elements greater than medianOfMedian
        List<Integer> listWithSmallerNumbers = new ArrayList<>(); // elements less than medianOfMedian
        for (Integer element : array) {
            if (element < medianOfMedian) {
                listWithSmallerNumbers.add(element);
            } else if (element > medianOfMedian) {
                listWithGreaterNumbers.add(element);
            }
        }
        // Next step.
        if (k <= listWithGreaterNumbers.size()) return findKthLargestUsingMedian((Integer[]) listWithGreaterNumbers.toArray(new Integer[listWithGreaterNumbers.size()]), k);
        else if ((k - 1) == listWithGreaterNumbers.size()) return medianOfMedian;
        else if (k > (listWithGreaterNumbers.size() + 1)) return findKthLargestUsingMedian((Integer[]) listWithSmallerNumbers.toArray(new Integer[listWithSmallerNumbers.size()]), k-listWithGreaterNumbers.size()-1);
        return -1;
    }

    public static int findMedianOfMedians(Integer[] mainList, int noOfRequiredLists) {
        int[] medians = new int[noOfRequiredLists];
        for (int count = 0; count < noOfRequiredLists; count++) {
            int startOfPartialArray = 5 * count;
            int endOfPartialArray = startOfPartialArray + 5;
            Integer[] partialArray = Arrays.copyOfRange((Integer[]) mainList, startOfPartialArray, endOfPartialArray);
            // Step 2: Find median of each of these sublists.
            int medianIndex = partialArray.length/2;
            medians[count] = partialArray[medianIndex];
        }
        // Step 3: Find median of the medians.
        return medians[medians.length / 2];
    }

Just for sake of completion, another algorithm makes use of Priority Queue and takes time O(nlogn).

public static int findKthLargestUsingPriorityQueue(Integer[] nums, int k) {
        int p = 0;
        int numElements = nums.length;
        // create priority queue where all the elements of nums will be stored
        PriorityQueue<Integer> pq = new PriorityQueue<Integer>();

        // place all the elements of the array to this priority queue
        for (int n : nums) {
            pq.add(n);
        }

        // extract the kth largest element
        while (numElements - k + 1 > 0) {
            p = pq.poll();
            k++;
        }

        return p;
    }

Both of these algorithms can be tested as:

public static void main(String[] args) throws IOException {
        Integer[] numbers = new Integer[]{2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};
        System.out.println(findKthLargestUsingMedian(numbers, 8));
        System.out.println(findKthLargestUsingPriorityQueue(numbers, 8));
    }

As expected output is: 18 18

akhil_mittal
  • 23,309
  • 7
  • 96
  • 95
2

Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.

Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):

Finding Kth Smallest Element in an Unsorted Array

Zoran Horvat
  • 10,924
  • 3
  • 31
  • 43
2

Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies, Now recurrence becomes: T(n) = T(n/2) + cn which gives me O (n) overal.

pranjal
  • 692
  • 1
  • 6
  • 19
2

How about this kinda approach

Maintain a buffer of length k and a tmp_max, getting tmp_max is O(k) and is done n times so something like O(kn)

enter image description here

Is it right or am i missing something ?

Although it doesn't beat average case of quickselect and worst case of median statistics method but its pretty easy to understand and implement.

Aishwat Singh
  • 4,331
  • 2
  • 26
  • 48
2

There is also one algorithm, that outperforms quickselect algorithm. It's called Floyd-Rivets (FR) algorithm.

Original article: https://doi.org/10.1145/360680.360694

Downloadable version: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.7108&rep=rep1&type=pdf

Wikipedia article https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm

I tried to implement quickselect and FR algorithm in C++. Also I compared them to the standard C++ library implementations std::nth_element (which is basically introselect hybrid of quickselect and heapselect). The result was quickselect and nth_element ran comparably on average, but FR algorithm ran approx. twice as fast compared to them.

Sample code that I used for FR algorithm:

template <typename T>
T FRselect(std::vector<T>& data, const size_t& n)
{
    if (n == 0)
        return *(std::min_element(data.begin(), data.end()));
    else if (n == data.size() - 1)
        return *(std::max_element(data.begin(), data.end()));
    else
        return _FRselect(data, 0, data.size() - 1, n);
}

template <typename T>
T _FRselect(std::vector<T>& data, const size_t& left, const size_t& right, const size_t& n)
{
    size_t leftIdx = left;
    size_t rightIdx = right;

    while (rightIdx > leftIdx)
    {
        if (rightIdx - leftIdx > 600)
        {
            size_t range = rightIdx - leftIdx + 1;
            long long i = n - (long long)leftIdx + 1;
            long long z = log(range);
            long long s = 0.5 * exp(2 * z / 3);
            long long sd = 0.5 * sqrt(z * s * (range - s) / range) * sgn(i - (long long)range / 2);

            size_t newLeft = fmax(leftIdx, n - i * s / range + sd);
            size_t newRight = fmin(rightIdx, n + (range - i) * s / range + sd);

            _FRselect(data, newLeft, newRight, n);
        }
        T t = data[n];
        size_t i = leftIdx;
        size_t j = rightIdx;
        // arrange pivot and right index
        std::swap(data[leftIdx], data[n]);
        if (data[rightIdx] > t)
            std::swap(data[rightIdx], data[leftIdx]);

        while (i < j)
        {
            std::swap(data[i], data[j]);
            ++i; --j;
            while (data[i] < t) ++i;
            while (data[j] > t) --j;
        }

        if (data[leftIdx] == t)
            std::swap(data[leftIdx], data[j]);
        else
        {
            ++j;
            std::swap(data[j], data[rightIdx]);
        }
        // adjust left and right towards the boundaries of the subset
        // containing the (k - left + 1)th smallest element
        if (j <= n)
            leftIdx = j + 1;
        if (n <= j)
            rightIdx = j - 1;
    }

    return data[leftIdx];
}

template <typename T>
int sgn(T val) {
    return (T(0) < val) - (val < T(0));
}
L'ahim
  • 43
  • 4
1

i would like to suggest one answer

if we take the first k elements and sort them into a linked list of k values

now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)

cheers

1

Explanation of the median - of - medians algorithm to find the k-th largest integer out of n can be found here: http://cs.indstate.edu/~spitla/presentation.pdf

Implementation in c++ is below:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

int findMedian(vector<int> vec){
//    Find median of a vector
    int median;
    size_t size = vec.size();
    median = vec[(size/2)];
    return median;
}

int findMedianOfMedians(vector<vector<int> > values){
    vector<int> medians;

    for (int i = 0; i < values.size(); i++) {
        int m = findMedian(values[i]);
        medians.push_back(m);
    }

    return findMedian(medians);
}

void selectionByMedianOfMedians(const vector<int> values, int k){
//    Divide the list into n/5 lists of 5 elements each
    vector<vector<int> > vec2D;

    int count = 0;
    while (count != values.size()) {
        int countRow = 0;
        vector<int> row;

        while ((countRow < 5) && (count < values.size())) {
            row.push_back(values[count]);
            count++;
            countRow++;
        }
        vec2D.push_back(row);
    }

    cout<<endl<<endl<<"Printing 2D vector : "<<endl;
    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            cout<<vec2D[i][j]<<" ";
        }
        cout<<endl;
    }
    cout<<endl;

//    Calculating a new pivot for making splits
    int m = findMedianOfMedians(vec2D);
    cout<<"Median of medians is : "<<m<<endl;

//    Partition the list into unique elements larger than 'm' (call this sublist L1) and
//    those smaller them 'm' (call this sublist L2)
    vector<int> L1, L2;

    for (int i = 0; i < vec2D.size(); i++) {
        for (int j = 0; j < vec2D[i].size(); j++) {
            if (vec2D[i][j] > m) {
                L1.push_back(vec2D[i][j]);
            }else if (vec2D[i][j] < m){
                L2.push_back(vec2D[i][j]);
            }
        }
    }

//    Checking the splits as per the new pivot 'm'
    cout<<endl<<"Printing L1 : "<<endl;
    for (int i = 0; i < L1.size(); i++) {
        cout<<L1[i]<<" ";
    }

    cout<<endl<<endl<<"Printing L2 : "<<endl;
    for (int i = 0; i < L2.size(); i++) {
        cout<<L2[i]<<" ";
    }

//    Recursive calls
    if ((k - 1) == L1.size()) {
        cout<<endl<<endl<<"Answer :"<<m;
    }else if (k <= L1.size()) {
        return selectionByMedianOfMedians(L1, k);
    }else if (k > (L1.size() + 1)){
        return selectionByMedianOfMedians(L2, k-((int)L1.size())-1);
    }

}

int main()
{
    int values[] = {2, 3, 5, 4, 1, 12, 11, 13, 16, 7, 8, 6, 10, 9, 17, 15, 19, 20, 18, 23, 21, 22, 25, 24, 14};

    vector<int> vec(values, values + 25);

    cout<<"The given array is : "<<endl;
    for (int i = 0; i < vec.size(); i++) {
        cout<<vec[i]<<" ";
    }

    selectionByMedianOfMedians(vec, 8);

    return 0;
}
totjammykd
  • 337
  • 4
  • 10
1

There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.

In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):

#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }

# Note: The code needs more than 2 elements to work
float lefselect(float a[], const int n, const int k) {
    int l=0, m = n-1, i=l, j=m;
    float x;

    while (l<m) {
        if( a[k] < a[i] ) F_SWAP(a[i],a[k]);
        if( a[j] < a[i] ) F_SWAP(a[i],a[j]);
        if( a[j] < a[k] ) F_SWAP(a[k],a[j]);

        x=a[k];
        while (j>k & i<k) {
            do i++; while (a[i]<x);
            do j--; while (a[j]>x);

            F_SWAP(a[i],a[j]);
        }
        i++; j--;

        if (j<k) {
            while (a[i]<x) i++;
            l=i; j=m;
        }
        if (k<i) {
            while (x<a[j]) j--;
            m=j; i=l;
        }
    }
    return a[k];
}

In benchmarks that i did here, LefSelect is 20-30% faster than QuickSelect.

estama
  • 61
  • 4
1

Haskell Solution:

kthElem index list = sort list !! index

withShape ~[]     []     = []
withShape ~(x:xs) (y:ys) = x : withShape xs ys

sort []     = []
sort (x:xs) = (sort ls `withShape` ls) ++ [x] ++ (sort rs `withShape` rs)
  where
   ls = filter (<  x)
   rs = filter (>= x)

This implements the median of median solutions by using the withShape method to discover the size of a partition without actually computing it.

user3585010
  • 135
  • 4
1

Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.

#include<iostream>
#include<climits>
#include<cstdlib>
using namespace std;

int randomPartition(int arr[], int l, int r);

// This function returns k'th smallest element in arr[l..r] using
// QuickSort based method.  ASSUMPTION: ALL ELEMENTS IN ARR[] ARE DISTINCT
int kthSmallest(int arr[], int l, int r, int k)
{
    // If k is smaller than number of elements in array
    if (k > 0 && k <= r - l + 1)
    {
        // Partition the array around a random element and
        // get position of pivot element in sorted array
        int pos = randomPartition(arr, l, r);

        // If position is same as k
        if (pos-l == k-1)
            return arr[pos];
        if (pos-l > k-1)  // If position is more, recur for left subarray
            return kthSmallest(arr, l, pos-1, k);

        // Else recur for right subarray
        return kthSmallest(arr, pos+1, r, k-pos+l-1);
    }

    // If k is more than number of elements in array
    return INT_MAX;
}

void swap(int *a, int *b)
{
    int temp = *a;
    *a = *b;
    *b = temp;
}

// Standard partition process of QuickSort().  It considers the last
// element as pivot and moves all smaller element to left of it and
// greater elements to right. This function is used by randomPartition()
int partition(int arr[], int l, int r)
{
    int x = arr[r], i = l;
    for (int j = l; j <= r - 1; j++)
    {
        if (arr[j] <= x) //arr[i] is bigger than arr[j] so swap them
        {
            swap(&arr[i], &arr[j]);
            i++;
        }
    }
    swap(&arr[i], &arr[r]); // swap the pivot
    return i;
}

// Picks a random pivot element between l and r and partitions
// arr[l..r] around the randomly picked element using partition()
int randomPartition(int arr[], int l, int r)
{
    int n = r-l+1;
    int pivot = rand() % n;
    swap(&arr[l + pivot], &arr[r]);
    return partition(arr, l, r);
}

// Driver program to test above methods
int main()
{
    int arr[] = {12, 3, 5, 7, 4, 19, 26};
    int n = sizeof(arr)/sizeof(arr[0]), k = 3;
    cout << "K'th smallest element is " << kthSmallest(arr, 0, n-1, k);
    return 0;
}

The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)

learner
  • 123
  • 2
  • 11
1
  1. Have Priority queue created.
  2. Insert all the elements into heap.
  3. Call poll() k times.

    public static int getKthLargestElements(int[] arr)
    {
        PriorityQueue<Integer> pq =  new PriorityQueue<>((x , y) -> (y-x));
        //insert all the elements into heap
        for(int ele : arr)
           pq.offer(ele);
        // call poll() k times
        int i=0;
        while(i&lt;k)
         {
           int result = pq.poll();
         } 
       return result;        
    }
    
slfan
  • 8,950
  • 115
  • 65
  • 78
Bhagwati Malav
  • 3,349
  • 2
  • 20
  • 33
  • How does this answer `[How can we] find the kth largest element in an unsorted array of length n [in] "expected" O(n)?` – greybeard Sep 18 '22 at 10:08
0

This is an implementation in Javascript.

If you release the constraint that you cannot modify the array, you can prevent the use of extra memory using two indexes to identify the "current partition" (in classic quicksort style - http://www.nczonline.net/blog/2012/11/27/computer-science-in-javascript-quicksort/).

function kthMax(a, k){
    var size = a.length;

    var pivot = a[ parseInt(Math.random()*size) ]; //Another choice could have been (size / 2) 

    //Create an array with all element lower than the pivot and an array with all element higher than the pivot
    var i, lowerArray = [], upperArray = [];
    for (i = 0; i  < size; i++){
        var current = a[i];

        if (current < pivot) {
            lowerArray.push(current);
        } else if (current > pivot) {
            upperArray.push(current);
        }
    }

    //Which one should I continue with?
    if(k <= upperArray.length) {
        //Upper
        return kthMax(upperArray, k);
    } else {
        var newK = k - (size - lowerArray.length);

        if (newK > 0) {
            ///Lower
            return kthMax(lowerArray, newK);
        } else {
            //None ... it's the current pivot!
            return pivot;
        }   
    }
}  

If you want to test how it perform, you can use this variation:

    function kthMax (a, k, logging) {
         var comparisonCount = 0; //Number of comparison that the algorithm uses
         var memoryCount = 0;     //Number of integers in memory that the algorithm uses
         var _log = logging;

         if(k < 0 || k >= a.length) {
            if (_log) console.log ("k is out of range"); 
            return false;
         }      

         function _kthmax(a, k){
             var size = a.length;
             var pivot = a[parseInt(Math.random()*size)];
             if(_log) console.log("Inputs:", a,  "size="+size, "k="+k, "pivot="+pivot);

             // This should never happen. Just a nice check in this exercise
             // if you are playing with the code to avoid never ending recursion            
             if(typeof pivot === "undefined") {
                 if (_log) console.log ("Ops..."); 
                 return false;
             }

             var i, lowerArray = [], upperArray = [];
             for (i = 0; i  < size; i++){
                 var current = a[i];
                 if (current < pivot) {
                     comparisonCount += 1;
                     memoryCount++;
                     lowerArray.push(current);
                 } else if (current > pivot) {
                     comparisonCount += 2;
                     memoryCount++;
                     upperArray.push(current);
                 }
             }
             if(_log) console.log("Pivoting:",lowerArray, "*"+pivot+"*", upperArray);

             if(k <= upperArray.length) {
                 comparisonCount += 1;
                 return _kthmax(upperArray, k);
             } else if (k > size - lowerArray.length) {
                 comparisonCount += 2;
                 return _kthmax(lowerArray, k - (size - lowerArray.length));
             } else {
                 comparisonCount += 2;
                 return pivot;
             }
     /* 
      * BTW, this is the logic for kthMin if we want to implement that... ;-)
      * 

             if(k <= lowerArray.length) {
                 return kthMin(lowerArray, k);
             } else if (k > size - upperArray.length) {
                 return kthMin(upperArray, k - (size - upperArray.length));
             } else 
                 return pivot;
     */            
         }

         var result = _kthmax(a, k);
         return {result: result, iterations: comparisonCount, memory: memoryCount};
     }

The rest of the code is just to create some playground:

    function getRandomArray (n){
        var ar = [];
        for (var i = 0, l = n; i < l; i++) {
            ar.push(Math.round(Math.random() * l))
        }

        return ar;
    }

    //Create a random array of 50 numbers
    var ar = getRandomArray (50);   

Now, run you tests a few time. Because of the Math.random() it will produce every time different results:

    kthMax(ar, 2, true);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 2);
    kthMax(ar, 34, true);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);
    kthMax(ar, 34);

If you test it a few times you can see even empirically that the number of iterations is, on average, O(n) ~= constant * n and the value of k does not affect the algorithm.

Chris Cinelli
  • 4,679
  • 4
  • 28
  • 40
0

iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)

Kevin
  • 7,162
  • 11
  • 46
  • 70
  • That "bump" is O(n) if you're using an array, or down to O(log n) (I think) if you use a better structure. – Kirk Strauser Oct 30 '08 at 21:11
  • It needn't be O(log k) - if the list is a linked list then adding the new element to the top and dropping the last element is more like O(2) – Alnitak Oct 30 '08 at 21:14
  • The bump would be O(k) for an array-backed list, O(1) for an appropriately-linked list. Either way, this sort of question generally assumes it to be of minimal impact compared to n and it introduces no more factors of n. – bobince Oct 30 '08 at 21:16
  • it would also be O(1) if the bump uses a ring-buffer – Alnitak Oct 30 '08 at 21:18
  • 1
    Anyhow, the comment's algorithm is incomplete, it fails to consider an element of n coming in which is the new (eg) second-largest. Worst case behaviour, where each element in n must be compared with each in the highscore table, is O(kn) - but that still probably means O(n) in terms of the question. – bobince Oct 30 '08 at 21:21
  • An algorithm given with time O(kn) actually has a worst case of O(n^2) where k=n. Although in that case it would be faster to look for the smallest item. The algorithm could always be reversed in the case where k>n/2 to look for the kth smallest item. – Elie Oct 30 '08 at 21:26
0

I came up with this algorithm and seems to be O(n):

Let's say k=3 and we want to find the 3rd largest item in the array. I would create three variables and compare each item of the array with the minimum of these three variables. If array item is greater than our minimum, we would replace the min variable with the item value. We continue the same thing until end of the array. The minimum of our three variables is the 3rd largest item in the array.

define variables a=0, b=0, c=0
iterate through the array items
    find minimum a,b,c
    if item > min then replace the min variable with item value
    continue until end of array
the minimum of a,b,c is our answer

And, to find Kth largest item we need K variables.

Example: (k=3)

[1,2,4,1,7,3,9,5,6,2,9,8]

Final variable values:

a=7 (answer)
b=8
c=9

Can someone please review this and let me know what I am missing?

advncd
  • 3,787
  • 1
  • 25
  • 31
  • Due to the *k* comparisons in *replace the min variable with the item value [and establish a new min]*, this is O(*nk*). Not a big deal for *k*=3 (still O(*n*) for any fixed *k*), but for *k=n*/2, this degenerates to O(*n*²). – greybeard Sep 18 '22 at 09:56
0

Here is the implementation of the algorithm eladv suggested(I also put here the implementation with random pivot):

public class Median {

    public static void main(String[] s) {

        int[] test = {4,18,20,3,7,13,5,8,2,1,15,17,25,30,16};
        System.out.println(selectK(test,8));

        /*
        int n = 100000000;
        int[] test = new int[n];
        for(int i=0; i<test.length; i++)
            test[i] = (int)(Math.random()*test.length);

        long start = System.currentTimeMillis();
        random_selectK(test, test.length/2);
        long end = System.currentTimeMillis();
        System.out.println(end - start);
        */
    }

    public static int random_selectK(int[] a, int k) {
        if(a.length <= 1)
            return a[0];

        int r = (int)(Math.random() * a.length);
        int p = a[r];

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return random_selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return random_selectK(temp,k-small-equal);
        }
    }

    public static int selectK(int[] a, int k) {
        if(a.length <= 5) {
            Arrays.sort(a);
            return a[k-1];
        }

        int p = median_of_medians(a);

        int small = 0, equal = 0, big = 0;
        for(int i=0; i<a.length; i++) {
            if(a[i] < p) small++;
            else if(a[i] == p) equal++;
            else if(a[i] > p) big++;
        }

        if(k <= small) {
            int[] temp = new int[small];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] < p)
                    temp[j++] = a[i];
            return selectK(temp, k);
        }

        else if (k <= small+equal)
            return p;

        else {
            int[] temp = new int[big];
            for(int i=0, j=0; i<a.length; i++)
                if(a[i] > p)
                    temp[j++] = a[i];
            return selectK(temp,k-small-equal);
        }
    }

    private static int median_of_medians(int[] a) {
        int[] b = new int[a.length/5];
        int[] temp = new int[5];
        for(int i=0; i<b.length; i++) {
            for(int j=0; j<5; j++)
                temp[j] = a[5*i + j];
            Arrays.sort(temp);
            b[i] = temp[2];
        }

        return selectK(b, b.length/2 + 1);
    }
}
TheLogicGuy
  • 682
  • 8
  • 19
  • Far as I can tell, eladv spelled out a *Select(A,n,i) using median-of-medians* and suggested [introselect](https://en.m.wikipedia.org/wiki/Introselect). Neither `selectK(a, k)` nor `random_selectK(a, k)` presented above implement introselect. `median_of_medians()` is surprisingly indirect. – greybeard Sep 18 '22 at 09:44
0

it is similar to the quickSort strategy, where we pick an arbitrary pivot, and bring the smaller elements to its left, and the larger to the right

    public static int kthElInUnsortedList(List<int> list, int k)
    {
        if (list.Count == 1)
            return list[0];

        List<int> left = new List<int>();
        List<int> right = new List<int>();

        int pivotIndex = list.Count / 2;
        int pivot = list[pivotIndex]; //arbitrary

        for (int i = 0; i < list.Count && i != pivotIndex; i++)
        {
            int currentEl = list[i];
            if (currentEl < pivot)
                left.Add(currentEl);
            else
                right.Add(currentEl);
        }

        if (k == left.Count + 1)
            return pivot;

        if (left.Count < k)
            return kthElInUnsortedList(right, k - left.Count - 1);
        else
            return kthElInUnsortedList(left, k);
    }
Lee.O.
  • 63
  • 5
  • Did you test this? How? Try `List some = new List{9, 1, 2, 3, 4, 5}; for (int k = 0 ; k < some.Count ; k++) Console.WriteLine(kthElInUnsortedList(some, k));` - and without the 9. – greybeard Sep 18 '22 at 09:23
0

Go to the End of this link : ...........

http://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array-set-3-worst-case-linear-time/

Victor
  • 761
  • 8
  • 7
  • This answer would gain value if you [quoted](https://stackoverflow.com/help/referencing) main idea and claim from the hyperlinked article. – greybeard Sep 18 '22 at 09:02
0

You can find the kth smallest element in O(n) time and constant space. If we consider the array is only for integers.

The approach is to do a binary search on the range of Array values. If we have a min_value and a max_value both in integer range, we can do a binary search on that range. We can write a comparator function which will tell us if any value is the kth-smallest or smaller than kth-smallest or bigger than kth-smallest. Do the binary search until you reach the kth-smallest number

Here is the code for that

class Solution:

def _iskthsmallest(self, A, val, k):
    less_count, equal_count = 0, 0
    for i in range(len(A)):
        if A[i] == val: equal_count += 1
        if A[i] < val: less_count += 1

    if less_count >= k: return 1
    if less_count + equal_count < k: return -1
    return 0

def kthsmallest_binary(self, A, min_val, max_val, k):
    if min_val == max_val:
        return min_val
    mid = (min_val + max_val)/2
    iskthsmallest = self._iskthsmallest(A, mid, k)
    if iskthsmallest == 0: return mid
    if iskthsmallest > 0: return self.kthsmallest_binary(A, min_val, mid, k)
    return self.kthsmallest_binary(A, mid+1, max_val, k)

# @param A : tuple of integers
# @param B : integer
# @return an integer
def kthsmallest(self, A, k):
    if not A: return 0
    if k > len(A): return 0
    min_val, max_val = min(A), max(A)
    return self.kthsmallest_binary(A, min_val, max_val, k)
Anubhav Agarwal
  • 1,982
  • 5
  • 28
  • 40
-1

First we can build a BST from unsorted array which takes O(n) time and from the BST we can find the kth smallest element in O(log(n)) which over all counts to an order of O(n).

  • `we can build a BST from unsorted array which takes O(n) time` I can't. Given a BST of *n* elements, I can fill a same size array with the elements in order: O(n) building the BST would allow O(n) sorting/ordering. – greybeard Sep 18 '22 at 08:19
-1

What I would do is this:

initialize empty doubly linked list l
for each element e in array
    if e larger than head(l)
        make e the new head of l
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.

Update:

initialize empty sorted tree l
for each element e in array
    if e between head(l) and tail(l)
        insert e into l // O(log k)
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element
Jasper Bekkers
  • 6,711
  • 32
  • 46
  • What if e is smaller than head(l)? It could still be larger than the kth largest element, but would never get added to that list. You will need to sort the list of items in order for this to work, in ascending order. – Elie Oct 30 '08 at 21:22
  • You are right, guess I'll need to think this through some more. :-) – Jasper Bekkers Oct 30 '08 at 21:27
  • The solution would be to check if e is between head(l) and tail(l) and insert it at the correct position if it is. Making this O(kn). You could make it O(n log k) when using a binary tree that keeps track of the min and max elements. – Jasper Bekkers Oct 30 '08 at 21:30
-1
    function nthMax(arr, nth = 1, maxNumber = Infinity) {
      let large = -Infinity;
      for(e of arr) {
        if(e > large && e < maxNumber ) {
          large = e;
        } else if (maxNumber == large) {
          nth++;
        }
      }
      return nth==0 ? maxNumber: nthMax(arr, nth-1, large);
    }

    let array = [11,12,12,34,23,34];

    let secondlargest = nthMax(array, 1);

    console.log("Number:", secondlargest);
  • Do you expect this to run in time proportional to the number of elements in `arr`? `find the kth largest element in an unsorted array of length n [in] "expected" O(n)` – greybeard Sep 18 '22 at 08:24
-2

For very small values of k (i.e. when k << n), we can get it done in ~O(n) time. Otherwise, if k is comparable to n, we get it in O(nlogn).