How to get max (top) N values across entire numpy matrix

Question

I want to get the top N (maximal) args & values across an entire numpy matrix, as opposed to across a single dimension (rows / columns).

Example input (with N=3):

import numpy as np
mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])

print(mat)

[[9 8 1 2]
 [3 7 2 5]
 [0 3 6 2]
 [0 2 1 5]]

Desired output: [9, 8, 7]

Since max isn't transitive across a single dimension, going by rows or columns doesn't work.

# by rows, no 8
np.squeeze(np.asarray(mat.max(1).reshape(-1)))[:3]
array([9, 7, 6])

# by cols, no 7
np.squeeze(np.asarray(mat.max(0)))[:3]
array([9, 8, 6])

I have code that works, but looks really clunky to me.

# reshape into single vector
mat_as_vector = np.squeeze(np.asarray(mat.reshape(-1)))

# get top 3 arg positions
top3_args = mat_as_vector.argsort()[::-1][:3]

# subset the reshaped matrix
top3_vals = mat_as_vector[top3_args]

print(top3_vals)

array([9, 8, 7])

Would appreciate any shorter way / more efficient way / magic numpy function to do this!

Also see https://stackoverflow.com/a/50533986 if you need _the indices_ of the top elements. — AGN Gazer, May 28 '18 at 04:21

AGN Gazer · Accepted Answer · 2018-05-28T04:29:07.630

4

Using numpy.partition() is significantly faster than performing full sort for this purpose:

np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:]

assuming N<=mat.size.

If you need the final result also be sorted (besides being top N), then you need to sort previous result (but presumably you will be sorting a smaller array than the original one):

np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])

If you need the result sorted from largest to lowest, post-pend [::-1] to the previous command:

np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])[::-1]

edited May 28 '18 at 04:29

answered May 28 '18 at 03:53

AGN Gazer

8,025
2
27
45

2

@filippo All that means is that afterward there will be two regions in the array. Everything in one region will be greater than everything in the other, but the order inside those regions is not guaranteed by the function. Any seeming order it might have is accidental. – Hans Musgrave May 28 '18 at 04:08
@HansMusgrave yep, that was actually my point, I assumed the desired outcome was the top sorted elements and `np.partition` was giving them as an accidental side effect, hence my doubt – filippo May 28 '18 at 04:12
@AGNGazer thanks for this great answer. i wonder -- is there a version of this that can give me the top N args as a stepping stone to the top N values? – Ido S May 28 '18 at 04:24
@IdoS When using `argpartition()` with a flattened version of a multi-dimensional array, you may need to "unravel" the indices using `np.unravel_index(flat_indices, mat.shape)`. – AGN Gazer May 28 '18 at 04:44

score 3 · Answer 2 · answered May 28 '18 at 03:47

3

One way may be with flatten and sorted and slice top n values:

sorted(mat.flatten().tolist()[0], reverse=True)[:3]

Result:

[9, 8, 7]

answered May 28 '18 at 03:47

niraj

17,498
4
33
48

2

In case anyone stumbles here from google, it's worth pointing out that in most instances numpy's vectorized operations will outperform other tricks from the standard library. This solution was 1.3-128x slower on my machine for various inputs than the `np.partition()` solution @AGN Gazer gave. If the solution doesn't need to be sorted then this solution is comparatively even much worse than that. – Hans Musgrave May 28 '18 at 04:17
@HansMusgrave Thanks, the other solution is much faster. – niraj May 28 '18 at 04:31
1

That said, yours has the advantage of not using obscure function calls and being easy to include in a project when short on time. You might also like `sorted(np.asarray(mat).flatten())[-3:]` to avoid the list conversion and extra kwarg. – Hans Musgrave May 28 '18 at 04:35

score -4 · Answer 3 · answered May 28 '18 at 03:59

-4

The idea is from this answer: How to get indices of N maximum values in a numpy array?

import numpy as np
import heapq

mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])
ind = heapq.nlargest(3, range(mat.size), mat.take)
print(mat.take(ind).tolist()[0])

Output

[9, 8, 7]

answered May 28 '18 at 03:59

Asterisk

3,534
2
34
53

Why plagiarise an answer when you can flag as duplicate? – user3483203 May 28 '18 at 04:00
I don't think I plagiarized as I referenced the source. – Asterisk May 28 '18 at 04:01
Thanks for pointing that out. Do you think it is entirely copy paste of original? – Asterisk May 28 '18 at 04:04
I understand frustration of the community about my answer. I honestly, did not know about the rules @chrisz referenced me to. I am glad he did as he offered constructive feedback why my answer is not in line with community guidelines. – Asterisk May 28 '18 at 04:20

How to get max (top) N values across entire numpy matrix

3 Answers3