Python, Numpy, replacing second max value with 1, others with 0

Question

exploring the Internet doesn't give me any results with my problem. I have array like this:

y=  [[ 2.63321579e-16   9.99986649e-01   2.90973702e-32   9.93230242e-06
        1.56965105e-30   1.63843623e-07   8.52455060e-22   0.00000000e+00
        5.65191413e-27   0.00000000e+00   3.20573202e-25   0.00000000e+00
        3.33013941e-06   0.00000000e+00   8.01929339e-22   2.14279644e-26
        0.00000000e+00   4.32979661e-08   1.01565330e-29   0.00000000e+00
        0.00000000e+00   4.52104604e-11]
     [  0.00000000e+00   1.57162935e-01   0.00000000e+00   0.00000000e+00
        0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
        0.00000000e+00   8.42837036e-01   3.78666698e-08   0.00000000e+00
        0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
        0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
        0.00000000e+00   0.00000000e+00]]

what I would like to do is replacing the second maximum value for each row with '1' and any other values there with '0'. I know how to this with max value, firstly creating zeros_like array and than replacing max value there with 1. So for this the method is:

x = np.zeros_like(y)
x[np.arange(len(y)), y.argmax(1)] = 1

but how would it be with the second max value? Desired output should be like:

y=  [[ 0 0 0 **1** 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]   
     [ 0 **1** 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

I can get the second max value, but replacing it causes my problem.

Maybe this would help: https://stackoverflow.com/questions/10337533/a-fast-way-to-find-the-largest-n-elements-in-an-numpy-array? — NPE, Sep 07 '17 at 14:23

Divakar · Answer 1 · 2017-09-07T15:42:59.123

Here's one approach based on np.argpartition. This is meant for performance, because it doesn't need to sort all elements in a row as it simply partitions into two parts separated by the n-th max element position. Thus, np.argpartition(a,-n, axis=1)[:,-n] would give us the n-th max element position for each row. So, the solution would be simply -

def n_largest_setarr(a, n=2):
    # a : Input array
    # n : We want n-max element position to be set to 1
    out = np.zeros_like(a)
    out[np.arange(len(a)), np.argpartition(a,-n, axis=1)[:,-n]] = 1
    return out

Sample run -

# Input array
In [68]: a
Out[68]: 
array([[222, 460, 240, 846, 997, 923, 327, 492],
       [135, 178, 882, 345, 827, 402, 837, 812],
       [820, 838, 666, 143, 122, 727, 323, 249]])

# Use proposed method for various `n` values
In [69]: n_largest_setarr(a, n=2) # second max position set to 1
Out[69]: 
array([[0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0],
       [1, 0, 0, 0, 0, 0, 0, 0]])

In [70]: n_largest_setarr(a, n=3) # third max position set to 1
Out[70]: 
array([[0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0]])

# Use the sorted array to verify values
In [71]: np.sort(a,axis=1)
Out[71]: 
array([[222, 240, 327, 460, 492, 846, 923, 997],
       [135, 178, 345, 402, 812, 827, 837, 882],
       [122, 143, 249, 323, 666, 727, 820, 838]])

Makis Tsantekidis · Accepted Answer · 2017-09-07T14:43:12.250

First to find the element you are looking for, you can use the argsort function to get a sorted list of indexes for the values of each vector on the second axis.

y = np.random.randn(2,10)
print(y)
sorted_idx = np.argsort(y, axis=1)
nth_element = 2 # Select the nth smallest element of each vector in the second dimension.
indexes = np.arange(y.shape[0]), np.argsort(y, axis=1)[:, nth_element]
answer = y[indexes]
print(answer)

If you wish to get the nth largest element in each vector just use nth_element = -2 instead

The result in this test case would be:

[[ 2.31754087  1.02712883 -1.06811812  1.2073763  -0.06212109 -0.78401522
  -2.28638542 -0.82081567  1.16203424  0.2775298 ]
 [ 0.30816667  0.81606153  1.32791256  0.65654608  0.36659678  1.29219518
  -0.72793581  0.26714565 -0.69083268 -0.83825039]]

[-0.82081567 -0.69083268]

After this you can just create a zeros matrix with the same shape as your initial matrix and replace the same elements from your saved index with the ones from it.

zeros = np.zeros(y.shape)
zeros[indexes] = y[indexes]
print(zeros)

which returns

[[ 0.          0.          0.          0.          0.          0.          0.
  -0.82081567  0.          0.        ]
 [ 0.          0.          0.          0.          0.          0.          0.
   0.         -0.69083268  0.        ]]

It was said, that it should give the second largest value for each row, but it doesn't. — JeffTheKiller, Sep 07 '17 at 14:31
sorry it is because `argsort` returns indexes in the ascending order of their values. I ll add a way to get the second largest in my answer — Makis Tsantekidis, Sep 07 '17 at 14:31
Essentially I use the reverse indexing and set `nth_element` to the negative index of the position of the largest element I want. So if you want the second largest value you just set `nth_element = -2` — Makis Tsantekidis, Sep 07 '17 at 14:36

Python, Numpy, replacing second max value with 1, others with 0

2 Answers2