3

I have a matrix, with many rows, and 8 columns. Each cell represents a probability for the current row to belong to 1 of the 8 classes. I would like to keep only the 2 highest values in each row, and set the rest to 0.

So far, the only way I can think of is by looping and sorting each row separately. For example:

a = np.array([[ 0.2  ,  0.1  ,  0.02 ,  0.01 ,  0.031,  0.11 ],
              [ 0.5  ,  0.1  ,  0.02 ,  0.01 ,  0.031,  0.11 ],
              [ 0.2  ,  0.1  ,  0.22 ,  0.15 ,  0.031,  0.11 ]])

I would like to get:

array([[ 0.2 ,  0.  ,  0.  ,  0.  ,  0.  ,  0.11],
       [ 0.5 ,  0.  ,  0.  ,  0.  ,  0.  ,  0.11],
       [ 0.2 ,  0.  ,  0.22,  0.  ,  0.  ,  0.  ]])

Thanks,

matlabit
  • 838
  • 2
  • 13
  • 31

2 Answers2

3

Here's one vectorized approach with np.argpartition -

m,n = a.shape
a[np.arange(m)[:,None],np.argpartition(a,n-2,axis=1)[:,:-2]] = 0

Sample run -

In [570]: a
Out[570]: 
array([[ 0.94791114,  0.48438182,  0.54574317,  0.45481231,  0.94013836],
       [ 0.03861196,  0.99047316,  0.7897759 ,  0.38863967,  0.93659426],
       [ 0.49436676,  0.93762758,  0.33694977,  0.45701655,  0.73078113],
       [ 0.21240062,  0.85141765,  0.00815352,  0.52517721,  0.49752736]])

In [571]: m,n = a.shape
     ...: a[np.arange(m)[:,None],np.argpartition(a,n-2,axis=1)[:,:-2]] = 0
     ...: 

In [572]: a
Out[572]: 
array([[ 0.94791114,  0.        ,  0.        ,  0.        ,  0.94013836],
       [ 0.        ,  0.99047316,  0.        ,  0.        ,  0.93659426],
       [ 0.        ,  0.93762758,  0.        ,  0.        ,  0.73078113],
       [ 0.        ,  0.85141765,  0.        ,  0.52517721,  0.        ]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
1

This should work, however, it alters a. Is this what you want? Is it essential to avoid loops?

sorted = np.sort(a, axis=1)

for idx, row in enumerate(a):
    row[row < sorted[idx,-2]] = 0    

Or you could do this:

a[a < sorted[:,None,-2]] = 0
Lisa
  • 3,365
  • 3
  • 19
  • 30