3

For example, I have a matrix of unique elements,

a=[
    [1,2,3,4],
    [7,5,8,6]
]

and another unique matrix filled with elements which has appeard in the first matrix.

b=[
    [4,1],
    [5,6]
]

And I expect the result of

[
    [3,0],
    [1,3]
].

That is to say, I want to find each row elements of b which equals to some elements of a in the same row, return the indices of these elements in a. How can i do that? Thanks.

Divakar
  • 218,885
  • 19
  • 262
  • 358

3 Answers3

2

Here's a vectorized approach -

# https://stackoverflow.com/a/40588862/ @Divakar
def searchsorted2d(a,b):
    m,n = a.shape
    max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
    r = max_num*np.arange(a.shape[0])[:,None]
    p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
    return p - n*(np.arange(m)[:,None])

def search_indices(a, b):
    sidx = a.argsort(1)
    a_s = np.take_along_axis(a,sidx,axis=1)
    return np.take_along_axis(sidx,searchsorted2d(a_s,b),axis=1)

Sample run -

In [54]: a
Out[54]: 
array([[1, 2, 3, 4],
       [7, 5, 8, 6]])

In [55]: b
Out[55]: 
array([[4, 1],
       [5, 6]])

In [56]: search_indices(a, b)
Out[56]: 
array([[3, 0],
       [1, 3]])

Another vectorized one leveraging broadcasting -

In [65]: (a[:,None,:]==b[:,:,None]).argmax(2)
Out[65]: 
array([[3, 0],
       [1, 3]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • 2
    @Navaro Well the usual NumPy way (under the hoods i.e. and the one that is performant) is with implementation in C. So, yes, it would be nice to see these being implemented in native NumPy. For now, I am just building up on existing tools. – Divakar Dec 28 '19 at 09:40
0

If you don't mind using loops, here's a quick solution using np.where:

import numpy as np

a=[[1,2,3,4],
   [7,5,8,6]]
b=[[4,1],
   [5,6]]

a = np.array(a)
b = np.array(b)
c = np.zeros_like(b)

for i in range(c.shape[0]):
    for j in range(c.shape[1]):
        _, pos = np.where(a==b[i,j])
        c[i,j] = pos

print(c.tolist())
Mercury
  • 3,417
  • 1
  • 10
  • 35
  • 1
    Thanks. But when I process on huge data, there will be much slow. I want it solve in the matrix way, just because of the demand of faster processing speed. – AgaigetS AgaigetS Dec 28 '19 at 09:07
0

You can do it this way:

np.split(pd.DataFrame(a).where(pd.DataFrame(np.isin(a,b))).T.sort_values(by=[0,1])[::-1].unstack().dropna().reset_index().iloc[:,1].to_numpy(),len(a))                               

# [array([3, 0]), array([1, 3])]

oppressionslayer
  • 6,942
  • 2
  • 7
  • 24