0

I have 2 paired 2-dim numpy arrays(say labels & scores)

labels = np.array([['a','b','c','d'],
                    ['a1','b1','c1','d1']])
scores = np.array([[0.1, 0.2, 0.3,0.4],
                  [1,2,3,4]])

I want to get top k item from them sorted by scores second row

I think I can achieve so by sorting:

[scores[i][1], scores[i][0], labels[i][0], labels[i][1]]

But is there a more elegant way with numpy or pandas library?

Pythoner
  • 5,265
  • 5
  • 33
  • 49
  • Take a look at [this answer](https://stackoverflow.com/a/23734295/7389264). Basically, you can use [`argpartition`](https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html) and [`argsort`](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) or [`sort`](https://numpy.org/doc/stable/reference/generated/numpy.sort.html) to get what you need. `argpartition` alone will get the indices of the top k, but not in any particular order, so you can extract those elements and sort them. To work along one dimension of the array, just provide the appropriate `axis` argument. – jirassimok Apr 21 '20 at 22:29

1 Answers1

1

numpy.argsort should do it

import numpy as np

labels = np.array([['a','b','c','d'],
                    ['a1','b1','c1','d1']])
scores = np.array([[0.1, 0.2, 0.3,0.4],
                  [1,2,3,4]])

k = 2 # number of "top items"
idx = np.argsort(scores[-1])[-k:] # get the indices of top values

topkScores = scores[:,idx].T # you can remove .T if you wish to get the score pairs in columns
Aly Hosny
  • 827
  • 5
  • 13