1

I am trying to retrieve the kth-largest (or smallest) element from a matrix row for all rows in the matrix. So e.g. if k = 3 then i want the 3 rd largest element from all rows. After I got the elements from all the rows I want to sort this vector.

This is what I got so far:

dist = np.array([[0.        , 2.7349432 , 3.57365027, 0.33696172, 1.40063669],
       [2.7349432 , 0.        , 0.8692355 , 2.9937996 , 1.47642103],
       [3.57365027, 0.8692355 , 0.        , 3.81469329, 2.27521406],
       [0.33696172, 2.9937996 , 3.81469329, 0.        , 1.62590145],
       [1.40063669, 1.47642103, 2.27521406, 1.62590145, 0.        ]])

neighbor_distance_argsort = np.argsort(dist, axis=1)
k_neighbor_dist = np.sort(dist[neighbor_distance_argsort == k - 1])

The order I get is wrong though and incorrect elements are picked. I know that argsort does not exactly what I want (or thought it would) do. And I've read somewhere that a double argsort would yield value ranks, but I cant adapt the solutions I've seen to 2D arrays.
There must be some easy solution here, which I just cant see.

2 Answers2

3

Fatsest way is going to be using np.partition, since you don't actually need to sort the whole array.

def kth(dist, k):
    return np.sort(np.partition(dist, k-1, axis = 1)[:, k-1])

kth(dist, 3)
Out[]: array([ 1.40063669,  1.47642103,  1.47642103,  1.62590145,  2.27521406])
Daniel F
  • 13,620
  • 2
  • 29
  • 55
  • thanks for the hint about partition, even though speed is not relevant problem for the current application its nice to know about for the future – DatenBergwerker Mar 04 '19 at 16:29
2

You can sort the rows and then select the (k-1)th column.

k = 3
dist = np.array([[0.        , 2.7349432 , 3.57365027, 0.33696172, 1.40063669], 
                 [2.7349432 , 0.        , 0.8692355 , 2.9937996 , 1.47642103], 
                 [3.57365027, 0.8692355 , 0.        , 3.81469329, 2.27521406],
                 [0.33696172, 2.9937996 , 3.81469329, 0.        , 1.62590145],
                 [1.40063669, 1.47642103, 2.27521406, 1.62590145, 0.        ]])

sortedDist =  np.sort(dist)
print sortedDist [:, k-1]

Best

Maxouille
  • 2,729
  • 2
  • 19
  • 42