1

I have a 2d array, it has huge number of rows(lager than 5000).

For the sake of simplicity,assume A is a simple version of my matrix

A=([[1,2,2,3,3,3],
   [[2,1,1,7,7,7],
   [[4,4,1,1,1,1]])

Now, A only has 3 rows:

the 1st row has 3 values: one 1, two 2,three 3.

the 2nd row has 3 values, one 2, two 1,three 7.

the last row has 2 values, two 4, four 1.

now I can easily find the majority value for each row:

1st is 3, 2nd is 7, 3rd is 1. (means my code already find each rows majority value and store them as [3,7,1] )

what I want to do is set each row's majority value to 0.

means set

A=([[1,2,2,0,0,0],
   [[2,1,1,0,0,0],
   [[4,4,0,0,0,0]])

A is just a simple instance.My matrix has huge number of rows.

So, how to do this thing more easily and efficiently?

I don't want to write a for loop to set the value for each row.

(means i can do A[0,A[0,:]==3]=0, A[1,A[1,:]==7]=0, A[2,A[2,:]==1]=0,but this is too complicated)

what I want is a form like this:

A[:,A[:,:]==[3,7,1]]=0

but numpy doesn't has this ability.

Can any one give me an efficient method for this? thank u very much!!!

For more generally situation, If I want to set each rows 1st biggest value to 0, 2nd biggest value to -1, 3rd biggest(if exist) value to -2 ...., how to do this?

means set:

A=([[-2,-1,-1,0,0,0],
   [[-2,-1,-1,0,0,0],
   [[-1,-1,0,0,0,0]])
zeekzhen
  • 159
  • 1
  • 10
  • Can you remove the general situation part from this question, as its already covered in your newer question? – Divakar Jul 26 '18 at 10:17

1 Answers1

2

Approach #1

Using 2D bincount -

# https://stackoverflow.com/a/46256361/ @Divakar
def bincount2D_vectorized(a):    
    N = a.max()+1
    a_offs = a + np.arange(a.shape[0])[:,None]*N
    return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)

A[A==bincount2D_vectorized(A).argmax(1)[:,None]] = 0

Sample run -

In [16]: A
Out[16]: 
array([[1, 2, 2, 3, 3, 3],
       [2, 1, 1, 7, 7, 7],
       [4, 4, 1, 1, 1, 1]])

In [17]: A[A==bincount2D_vectorized(A).argmax(1)[:,None]] = 0

In [18]: A
Out[18]: 
array([[1, 2, 2, 0, 0, 0],
       [2, 1, 1, 0, 0, 0],
       [4, 4, 0, 0, 0, 0]])

Approach #2

Using 2D mode efficiently -

A[A==mode(A.T,axis=0)[0][:,None]] = 0

Alternatively, from the same Q&A, we can use Scipy -

from scipy import stats

A[A==stats.mode(A.T)[0][0][:,None]] = 0
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • thanks a lot,I tried your approach A, it works!!!, another question: this method can only change the majority value( you use argmax). If I want set the biggest value to 0, and set the 2nd biggest value to -1, 3rd biggest value to -2 ...... how to do this? – zeekzhen Jul 26 '18 at 09:47
  • @zeekzhen Think that would suit better as a new question as the solution would be entirely different. Can you post a new one? – Divakar Jul 26 '18 at 09:48