2

My question is very similar to this one: How to get indices of N maximum values in a numpy array?

But I would like to get the indices in the same order I find them.

Let's take the example marked in that question as the correct solution:

import numpy as np
arr = np.array([1, 3, 2, 4, 5])
arr.argsort()[-3:][::-1]

array([4, 3, 1])

The result I'm looking for instead should be like:

array([1, 3, 4])
Alex Punkallo
  • 119
  • 4
  • 16
  • 1
    Why don't you just use `arr.argsort()[-3:]`? – Mazdak May 25 '18 at 16:58
  • Hi Kasramvd, let's say exchange 4 and 3: arr = np.array([1, 4, 2, 3, 5]) , your solution would give array([3, 1, 4]) and not, again, array([1, 3, 4]) as I'm looking for – Alex Punkallo May 25 '18 at 17:07

2 Answers2

4

Use numpy.argpartition():

k = 3
np.argpartition(arr, len(arr) - k)[-k:]

Adjust k index to whatever you need.

NOTE: returned indices are not guaranteed to be in the "sorted order" - just that anything past index k is larger than the value at position k in the sorted array.

NOTE 2: If you do need the returned indices to be sorted themselves, then simply add a numpy.sort() to the above command:

np.sort(np.argpartition(arr, len(arr) - k)[-k:])

numpy.argpartition() provides significant performance gains over full sort especially for large arr. In the above example you do a full sort only over the selected indices (not all).

AGN Gazer
  • 8,025
  • 2
  • 27
  • 45
2

It probably depends a bit on the sizes of a and k but often the fastest appears to be combining partition with flatnonzero or where:

>>> a = np.random.random(10000)
>>> k = 5
>>> 
>>> timeit("np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])", globals=globals(), number=10000)
0.8328661819687113
>>> timeit("np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])", globals=globals(), number=10000)
1.0577796879806556
>>> np.flatnonzero(a >= np.partition(a, len(a) - k)[len(a) - k])
array([2527, 4299, 5531, 6945, 7174])
>>> np.sort(np.argpartition(a, len(a) - k)[len(a) - k:])
array([2527, 4299, 5531, 6945, 7174])

Note 1: this highlights the significant performance cost of indirect indexing.

Note 2: as we only use the pivot element and discard the actual partition percentile should in theory be at least as fast but in practice it is way slower.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99