Finding Nth item of unsorted list without sorting the list

Question

Hey. I have a very large array and I want to find the Nth largest value. Trivially I can sort the array and then take the Nth element but I'm only interested in one element so there's probably a better way than sorting the entire array...

score 21 · Answer 1 · answered Jun 24 '09 at 03:21

21

A heap is the best data structure for this operation and Python has an excellent built-in library to do just this, called heapq.

import heapq

def nth_largest(n, iter):
    return heapq.nlargest(n, iter)[-1]

Example Usage:

>>> import random
>>> iter = [random.randint(0,1000) for i in range(100)]
>>> n = 10
>>> nth_largest(n, iter)
920

Confirm result by sorting:

>>> list(sorted(iter))[-10]
920

answered Jun 24 '09 at 03:21

FogleBird

74,300
25
125
131

4

This works well (linear time) if you want the nth largest or smallest item(s), where n is a constant. If n is half the length of the list (i.e. you want the median), this is still O(nlogn) time. – mgold Oct 11 '12 at 17:34
This not an in place solution, Quickselect will not add O(n) extra memory as this solution would. So for very large arrays as the question asks, this probably wouldn't be the most efficient. – db1234 Jan 02 '17 at 23:57

Dario · Accepted Answer · 2009-06-23T20:18:03.473

18

Sorting would require O(nlogn) runtime at minimum - There are very efficient selection algorithms which can solve your problem in linear time.

Partition-based selection (sometimes Quick select), which is based on the idea of quicksort (recursive partitioning), is a good solution (see link for pseudocode + Another example).

edited Jun 23 '09 at 20:18

answered Jun 23 '09 at 20:12

Dario

48,658
8
97
130

Nice link. I believe this is the best. – Jeff Meatball Yang Jun 23 '09 at 20:26
10

Unfortunately, the link "Another example" now leads to a protected web page at MIT, that you must have permission to access. – Beel Feb 10 '13 at 05:48
[NumPy has this built-in](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.partition.html), although it's kind of a weird dependency to pull in if you're not already making use of its ndarray functionality. – user2357112 Aug 04 '16 at 01:42

score 4 · Answer 3 · answered Jun 23 '09 at 20:23

A simple modified quicksort works very well in practice. It has average running time proportional to N (though worst case bad luck running time is O(N^2)).

Proceed like a quicksort. Pick a pivot value randomly, then stream through your values and see if they are above or below that pivot value and put them into two bins based on that comparison. In quicksort you'd then recursively sort each of those two bins. But for the N-th highest value computation, you only need to sort ONE of the bins.. the population of each bin tells you which bin holds your n-th highest value. So for example if you want the 125th highest value, and you sort into two bins which have 75 in the "high" bin and 150 in the "low" bin, you can ignore the high bin and just proceed to finding the 125-75=50th highest value in the low bin alone.

score 3 · Answer 4 · answered Jun 23 '09 at 20:07

3

You can iterate the entire sequence maintaining a list of the 5 largest values you find (this will be O(n)). That being said I think it would just be simpler to sort the list.

answered Jun 23 '09 at 20:07

Andrew Hare

344,730
71
640
635

But when it isn't the fifth but the nth element, you'll have O(n²) which is even worse than sorting. – Dario Jun 23 '09 at 20:10
I suppose you mean to maintain a list of the N largest values. But N cannot be too large in that case. – Bruno Ranschaert Jun 23 '09 at 20:11

score 3 · Answer 5 · answered Jan 25 '10 at 03:07

3

You could try the Median of Medians method - it's speed is O(N).

answered Jan 25 '10 at 03:07

user183037

2,549
4
31
42

score 1 · Answer 6 · answered Jun 23 '09 at 20:07

1

Use heapsort. It only partially orders the list until you draw the elements out.

answered Jun 23 '09 at 20:07

UncleO

8,299
21
29

1

Try to find the n/2-th element - Requires O(nlogn)! – Dario Jun 23 '09 at 20:13

Jeff Meatball Yang · Answer 7 · 2009-06-23T20:21:33.107

You essentially want to produce a "top-N" list and select the one at the end of that list.

So you can scan the array once and insert into an empty list when the largeArray item is greater than the last item of your top-N list, then drop the last item.

After you finish scanning, pick the last item in your top-N list.

An example for ints and N = 5:

int[] top5 = new int[5]();
top5[0] = top5[1] = top5[2] = top5[3] = top5[4] = 0x80000000; // or your min value

for(int i = 0; i < largeArray.length; i++) {
    if(largeArray[i] > top5[4]) {
       // insert into top5:
       top5[4] = largeArray[i];

       // resort:
       quickSort(top5);
    }
}

Unknown · Answer 8 · 2009-06-23T20:51:13.877

1

As people have said, you can walk the list once keeping track of K largest values. If K is large this algorithm will be close to O(n²).

However, you can store your Kth largest values as a binary tree and the operation becomes O(n log k).

According to Wikipedia, this is the best selection algorithm:

 function findFirstK(list, left, right, k)
     if right > left
         select pivotIndex between left and right
         pivotNewIndex := partition(list, left, right, pivotIndex)
         if pivotNewIndex > k  // new condition
             findFirstK(list, left, pivotNewIndex-1, k)
         if pivotNewIndex < k
             findFirstK(list, pivotNewIndex+1, right, k)

Its complexity is O(n)

edited Jun 23 '09 at 20:51

answered Jun 23 '09 at 20:18

Unknown

45,913
27
138
182

I believe the Tournament Algorithm, see Dario's links, is what you are shooting for. It has an operation of O(n + k*log(n)). – tgray Jun 23 '09 at 20:43
1

My mistake, though I would be interested to see a full implementation of this in Python. – tgray Jun 24 '09 at 11:44

score 0 · Answer 9 · answered Feb 11 '16 at 01:28

One thing you should do if this is in production code is test with samples of your data. For example, you might consider 1000 or 10000 elements 'large' arrays, and code up a quickselect method from a recipe.

The compiled nature of sorted, and its somewhat hidden and constantly evolving optimizations, make it faster than a python written quickselect method on small to medium sized datasets (< 1,000,000 elements). Also, you might find as you increase the size of the array beyond that amount, memory is more efficiently handled in native code, and the benefit continues.

So, even if quickselect is O(n) vs sorted's O(nlogn), that doesn't take into account how many actual machine code instructions processing each n elements will take, any impacts on pipelining, uses of processor caches and other things the creators and maintainers of sorted will bake into the python code.

score 0 · Answer 10 · answered Jul 02 '21 at 17:05

You can keep two different counts for each element -- the number of elements bigger than the element, and the number of elements lesser than the element.

Then do a if check N == number of elements bigger than each element -- the element satisfies this above condition is your output

check below solution

def NthHighest(l,n):
    if len(l) <n:
        return 0

    for i in range(len(l)):
        low_count = 0
        up_count = 0

        for j in range(len(l)):
            if l[j] > l[i]:
                up_count = up_count + 1
            else:
                low_count = low_count + 1

        # print(l[i],low_count, up_count)
        if up_count == n-1:
            #print(l[i])
            return l[i]

# # find the 4th largest number 

l = [1,3,4,9,5,15,5,13,19,27,22]
print(NthHighest(l,4))

-- using the above solution you can find both - Nth highest as well as Nth Lowest

Jack · Answer 11 · 2021-10-20T08:02:25.547

0

If you do not mind using pandas then:

import pandas as pd
N = 10
column_name = 0
pd.DataFrame(your_array).nlargest(N, column_name)

The above code will show you the N largest values along with the index position of each value.

Hope it helps. :-)

Pandas Nlargest Documentation

edited Oct 20 '21 at 08:02

answered Oct 20 '21 at 07:55

Jack

173
3
9

Finding Nth item of unsorted list without sorting the list

11 Answers11

Linked

Related