Get indices of N maximum values in a numpy array without sorting them?

Question

My question is very similar to this one: How to get indices of N maximum values in a numpy array?

But I would like to get the indices in the same order I find them.

Let's take the example marked in that question as the correct solution:

import numpy as np
arr = np.array([1, 3, 2, 4, 5])
arr.argsort()[-3:][::-1]

array([4, 3, 1])

The result I'm looking for instead should be like:

array([1, 3, 4])

Hi Kasramvd, let's say exchange 4 and 3: arr = np.array([1, 4, 2, 3, 5]) , your solution would give array([3, 1, 4]) and not, again, array([1, 3, 4]) as I'm looking for — Alex Punkallo, May 25 '18 at 17:07

AGN Gazer · Accepted Answer · 2018-05-28T04:33:39.057

Use numpy.argpartition():

k = 3
np.argpartition(arr, len(arr) - k)[-k:]

Adjust k index to whatever you need.

NOTE: returned indices are not guaranteed to be in the "sorted order" - just that anything past index k is larger than the value at position k in the sorted array.

NOTE 2: If you do need the returned indices to be sorted themselves, then simply add a numpy.sort() to the above command:

np.sort(np.argpartition(arr, len(arr) - k)[-k:])

numpy.argpartition() provides significant performance gains over full sort especially for large arr. In the above example you do a full sort only over the selected indices (not all).

Paul Panzer · Answer 2 · 2018-05-25T18:49:22.033

It probably depends a bit on the sizes of a and k but often the fastest appears to be combining partition with flatnonzero or where:

>>> a = np.random.random(10000)
>>> k = 5
>>> 
>>> timeit("np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])", globals=globals(), number=10000)
0.8328661819687113
>>> timeit("np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])", globals=globals(), number=10000)
1.0577796879806556
>>> np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])
array([2527, 4299, 5531, 6945, 7174])
>>> np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])
array([2527, 4299, 5531, 6945, 7174])

Note 1: this highlights the significant performance cost of indirect indexing.

Note 2: as we only use the pivot element and discard the actual partition percentile should in theory be at least as fast but in practice it is way slower.

Get indices of N maximum values in a numpy array without sorting them?

2 Answers2

Linked