1

I have a 2d array, it has huge number of rows(lager than 5000).

For the sake of simplicity,assume A is a simple version of my matrix

A=([[1,2,2,3,3,3],
   [[2,1,1,7,7,7],
   [[4,4,1,1,1,1]])

Now, A only has 3 rows:

the 1st row has 3 values: one 1, two 2,three 3.

the 2nd row has 3 values, one 2, two 1,three 7.

the last row has 2 values, two 4, four 1.

now I can easily find the majority value for each row:

1st row is 3, 2nd row is 7, 3rd row is 1. (means my code already find each rows majority value and store them as [3,7,1] )

the second and third majority value for each row is also easy to find,for the second majority value, 1st row is 2, 2nd is 1, 3rd is 4. (means my code already find each rows second majority value and store them as [2,1,4] ).

For third,forth,fifth...majority value, still easy to find.

what I want to do is set each rows 1st majority value to 0, 2nd majority value to -1, 3rd majority(if exist) value to -2 ...., how to do this?

means set:

A=([[-2,-1,-1,0,0,0],
   [[-2,-1,-1,0,0,0],
   [[-1,-1,0,0,0,0]])

A is just a simple instance.My matrix has huge number of rows.

So, how to do this thing more easily and efficiently?

I don't want to write a for loop to set the value for each row.

(means i can do A[0,A[0,:]==3]=0, A[1,A[1,:]==7]=0, A[2,A[2,:]==1]=0,but this is too complicated)

what I want is a form like this:

A[:,A[:,:]==[3,7,1]]=0

A[:,A[:,:]==[2,1,4]]=-1


A[:,A[:,:]==[1,2]]=-2 

but numpy doesn't has this ability.

Can any one give me an efficient method for this? thank u very much!!!

zeekzhen
  • 159
  • 1
  • 10
  • 2
    Sorry, I think I got confused between your `majority` and `biggest` terminologies. Those are two different things. Think you can delete this question and add details from this question into your old question and let's try to solve that one instead. – Divakar Jul 26 '18 at 10:04
  • sorry, its a grammar mistake ,i revised my question – zeekzhen Jul 26 '18 at 10:07
  • there is no biggest, its majority. second majority, third majority. – zeekzhen Jul 26 '18 at 10:07
  • @Divakar Hi,I revised my question's detail content,Do u understand now ? – zeekzhen Jul 26 '18 at 10:15
  • Yup, let's keep this question as the generic one. – Divakar Jul 26 '18 at 10:18

1 Answers1

2

Here's one method -

# https://stackoverflow.com/a/46256361/ @Divakar
def bincount2D_vectorized(a):    
    N = a.max()+1
    a_offs = a + np.arange(a.shape[0])[:,None]*N
    return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)

binsum = bincount2D_vectorized(A)
m,n = A.shape[0],binsum.shape[1]

index = np.empty((m,n), dtype=int)
sort_idx = binsum.argsort(1)[:,::-1]
index[np.arange(m)[:,None], sort_idx] = np.arange(0,-n,-1)
out = index[np.arange(m)[:,None],A]
Divakar
  • 218,885
  • 19
  • 262
  • 358