1

I have an original 2-D array

in_arr = np.array([[20,0,10,40,30], [50,40,60,90,80]])

# original array
# [[20,  0, 10, 40, 30],
#  [50, 40, 60, 90, 80]]

I need to sort the array by descending and by row, therefore, I use np.argsort(axis=1), and the output sorted indices I get are

out_arr1 = np.argsort(in_arr, axis = 1)[:,::-1]
>>> array([[3, 4, 0, 2, 1],
          [3, 4, 2, 0, 1]])

Then, I need to extract the first 3 largest number from each array row, the sample desired output being as follows:

# first 3 largest number from each row
# [[40,30,20],
#  [90,80,60]]

I have been struggling for a few hours to try to come out correct solution, but still have no idea what I should do. Your valuable time and advice will be much appreciated. Thank you!

jtlz2
  • 7,700
  • 9
  • 64
  • 114
Yeo Keat
  • 143
  • 1
  • 9

3 Answers3

2

Using numpy.argsort() returns an array of indices for the sorted array. As such, what your out_arr1 lets you know is where on each row to find the highest values.

If you are to continue this way, what you would need to do is for each row in in_arr (hereby written as in_arr[i]) take values found at the first 3 indices in out_arr1[i].

What that means is that out_arr1[i, 0] tells you where the highest value in in_arr on row i is located. In our case, out_arr1[0, 0] = 3, which means the highest value in row 0 is 40 (on index 3)

Doing this, the 3 largest numbers on each row are represented by out_arr1[0, 0], out_arr1[0, 1], out_arr1[0, 2] and out_arr1[1, 0], out_arr1[1, 1], out_arr1[1, 2].

to get the desired output, we would need something along the lines of:

final_arr = numpy.array([in_arr[0, out_arr1[0, 0], in_arr[0, out_arr1[0, 1], in_arr[0, out_arr1[0, 2], in_arr[1, out_arr1[1, 0], in_arr[1, out_arr1[1, 1], in_arr[1, out_arr1[1, 2]])

This however, is less than elegant, and there is another, easier solution to your problem.

Using numpy.sort() instead of numpy.argsort() we can return the exact values of in_arr sorted along an axis. By doing that, we no longer need to use an output index to find our 3 highest values, as they are the first 3 in our new output.

Considering out_arr2 as the output from numpy.sort(), the final array would look like:

final_arr = numpy.array([[out_arr[0, 0], out_arr[0, 1], out_arr[0, 2]], [out_arr[1, 0], out_arr[1, 1], out_arr[1, 2]]])
Stefan
  • 1,697
  • 15
  • 31
gankubas
  • 34
  • 3
  • Hi Gankubas and Stefan, thank you so much for your help, your suggestion and explanation is crystal clear, I have tried both ways and are works! Before this my own solution is messy and now I got idea from you to improve it, thank you so much! – Yeo Keat Feb 04 '21 at 02:44
  • Happy to have helped. By the way, if you find one of the solutions here fits your problem more, you could and should mark it as the accepted answer – gankubas Feb 04 '21 at 20:17
1

Based on this this answer you can do something like this

np.array(list(map(lambda x, y: y[x], np.argsort(in_arr), in_arr)))[:,::-1][:,:3]

which gives

array([[40, 30, 20],
       [90, 80, 60]])
Stefan
  • 1,697
  • 15
  • 31
  • Hi @Stefan, thank you again for your time and suggested solutions, although I not sure if you are the same person as another account named Stefan, but I really appreciate your help. I did tried this solution, and it also works for me! Thank you so much! – Yeo Keat Feb 04 '21 at 02:47
  • @YK you're welcome. If your problem is solved and you are happy with the solutions, please accept one of the given solutions proposed in this thread – Stefan Feb 04 '21 at 08:29
1

You can first sort all rows in the input array with a list comprehension using sorted. Then you extract the last 3 numbers of the rows.

in_arr = np.array([[20,0,10,40,30], [50,40,60,90,80]])

output = []
for i in [sorted(row) for row in in_arr]:
    output.append(i[-3:][::-1])
    
print(output)
Stefan
  • 897
  • 4
  • 13
  • Hi @Stefan, your suggestion is simple and nice! totally works well, thank you so much for your help! – Yeo Keat Feb 04 '21 at 02:46