20

I have a function that returns the argmax from a large 2d array

getMax = np.argmax(dist, axis=1)

However I want to get the next biggest values, is there a way of removing the getMax values from the original array and then performing argmax again?

Sprout
  • 630
  • 1
  • 5
  • 22
  • 1
    If you don't just need the second largest element, but maybe also the third and so on, the following might be helpful: http://stackoverflow.com/questions/16878715/how-to-find-the-index-of-n-largest-elements-in-a-list-or-np-array-python – Carsten Dec 14 '14 at 20:37
  • Do you actually want to remove the largest, or do you want to just find the next largest? – Joel Dec 14 '14 at 20:38
  • 1
    @Joel find the next largest – Sprout Dec 14 '14 at 20:47
  • Check my answer --- I don't have numpy on the computer I'm on right now, so can't bug check – Joel Dec 14 '14 at 20:50
  • @Joel I'm sorry to say but I can't change the order for what I need it for – Sprout Dec 14 '14 at 20:51

2 Answers2

27

Use the command np.argsort(a, axis=-1, kind='quicksort', order=None), but with appropriate choice of arguments (below).

here is the documentation. Note "It returns an array of indices of the same shape as a that index data along the given axis in sorted order."

The default order is small to large. So sort with -dist (for quick coding). Caution: doing -dist causes a new array to be generated which you may care about if dist is huge. See bottom of post for a better alternative there.

Here is an example:

x = np.array([[1,2,5,0],[5,7,2,3]])
L = np.argsort(-x, axis=1)

print L
[[2 1 0 3]
 [1 0 3 2]]

x  
array([[1, 2, 5, 0],
   [5, 7, 2, 3]])

So the n'th entry in a row of L gives the locations of the n'th largest element of x.

x is unchanged.

L[:,0] will give the same output as np.argmax(x)

L[:,0]
array([2, 1])

np.argmax(x,axis=1)
array([2, 1])

and L[:,1] will give the same as a hypothetical argsecondmax(x)

L[:,1]
array([1, 0])

If you don't want to generate a new list, so you don't want to use -x:

L = np.argsort(x, axis=1)

print L
[[3 0 1 2]
 [2 3 0 1]]

L[:,-1]
array([2, 1])

L[:,-2]
array([1, 0])
Joel
  • 22,598
  • 6
  • 69
  • 93
  • I'm sorry to say but order is important, I cannot change the order – Sprout Dec 14 '14 at 20:49
  • 1
    I think you misunderstand. The sorting `np.argsort(dist)` gives a list of locations from location of smallest to location of largest. So `np.argsort(-dist)` gives the list of locations of smallest to largest of `-dist`. So it's the locations of largest to smallest of `dist`. It doesn't change the order of `dist` in any way. – Joel Dec 14 '14 at 20:54
  • I've just realised this, however when I use it, it returns an array full of zeros rather than what it should – Sprout Dec 14 '14 at 21:08
  • Can you edit your question to give a minimal example and expected output? – Joel Dec 14 '14 at 21:09
  • I have solved it, it was an error later in the code. Thank you Joel! – Sprout Dec 14 '14 at 21:12
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/66863/discussion-between-joel-and-sprout). – Joel Dec 14 '14 at 21:23
4

If speed is important to you, using argpartition rather than argsort could be useful.

For example, to return the n largest elements from a list:

import numpy as np 

l = np.random.random_integer(0, 100, 1e6)

top_n_1 = l[np.argsort(-l)[0:n]]
top_n_2 = l[np.argpartition(l, -n)[-n:]]

The %timeit function in ipython reports 10 loops, best of 3: 56.9 ms per loop for top_n_1 and 100 loops, best of 3: 8.06 ms per loop for top_n_2.

I hope this is useful.

Dai
  • 345
  • 3
  • 12