2

Let's say I have a 3x3 matrix like this:

array([[8, 6, 3],
       [6, 7, 2],
       [0, 8, 9]])

Now I want to get the top k largest values in the matrix, and create a mask from it. If the number is in the top k largest, it has value 1, else 0. Let k=2. In the example above there are one 9 and two 8, we need to take all of them, so the returned mask is like this:

array([[1, 0, 0],
       [0, 0, 0],
       [0, 1, 1]])

I have read this and that answer, and I can use the indices as the mask. However, I wonder if there is any better solution?

Minh-Long Luu
  • 2,393
  • 1
  • 17
  • 39
  • 1
    Better in what terms? Performance, readability, length of code? – MrPisarik Apr 25 '21 at 09:57
  • `np.argpartition` is a great solution and I doubt you can find something much faster or much shorter (at least not without more provided information). – Jérôme Richard Apr 25 '21 at 10:18
  • the problem with argpartition is how to handle duplicate values. See my own [post](https://stackoverflow.com/a/67253650/758174) on the linked question. – Pierre D Apr 25 '21 at 12:58

1 Answers1

1

How about this?

def is_topk(a, k=1):
    _, rix = np.unique(-a, return_inverse=True)
    return np.where(rix < k, 1, 0).reshape(a.shape)

Example on your array:

>>> is_topk(a, 1)
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 1]])

>>> is_topk(a, 2)
array([[1, 0, 0],
       [0, 0, 0],
       [0, 1, 1]])
Pierre D
  • 24,012
  • 7
  • 60
  • 96