Fast way to check if elements in sub-dimension of a numpy array is in sub-dimension of another numpy array

Question

Lets say we have two integer numpy array A and B of size (N,M). I would like to check for every i in N, is A[i,:] in B[i,:].

A for-loop implementation is:

for i in range(N):
    C[i] = np.isin(A[i,:],B[i,:])

However this is quite slow for large array. is there any faster way to implement this? (e.g. vectorization?)

Thanks!

Showcasing your problem with a sample input/output always makes everyone's life easier — Sheldore, Sep 18 '18 at 10:11
Any insights on the input arrays - Are they ints? Are they positive ints? — Divakar, Sep 18 '18 at 10:12
Given that `C` is overwritten at any iteration, the fastest equivalent of the code you posted so far is: `C = np.isin(A[N - 1, :], B[N - 1, :]`. But I guess that this is not what you want, and you should be more precise in your question. — norok2, Sep 18 '18 at 10:17
@Divakar In my case they are unique, but I don't think this will affect how it is implemented though — Wang Duo, Sep 18 '18 at 11:03

Divakar · Accepted Answer · 2018-09-18T11:09:32.293

Here's one vectorized approach based on the per-row offsetting as discussed in more details in Vectorized searchsorted numpy's solution -

# https://stackoverflow.com/a/40588862/ @Divakar
def searchsorted2d(a,b):
    m,n = a.shape
    max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
    r = max_num*np.arange(a.shape[0])[:,None]
    p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
    return p - n*(np.arange(m)[:,None])

def numpy_isin2D(A,B):
    sB = np.sort(B,axis=1)
    idx = searchsorted2d(sB,A)
    idx[idx==sB.shape[1]] = 0
    return np.take_along_axis(sB, idx, axis=1) == A

Sample run -

In [351]: A
Out[351]: 
array([[5, 0, 3, 3],
       [7, 3, 5, 2],
       [4, 7, 6, 8],
       [8, 1, 6, 7],
       [7, 8, 1, 5]])

In [352]: B
Out[352]: 
array([[8, 4, 3, 0, 3, 5],
       [0, 2, 3, 8, 1, 3],
       [3, 3, 7, 0, 1, 0],
       [4, 7, 3, 2, 7, 2],
       [0, 0, 4, 5, 5, 6]])

In [353]: numpy_isin2D(A,B)
Out[353]: 
array([[ True,  True,  True,  True],
       [False,  True, False,  True],
       [False,  True, False, False],
       [False, False, False,  True],
       [False, False, False,  True]])

Eelco Hoogendoorn · Answer 2 · 2018-09-18T13:02:18.623

The numpy_indexed package (disclaim: I am its author) can be used to obtain a solution similar to Divakars, but with the low-level particulars abstracted away:

import numpy_indexed as npi
Ar = np.indices(A.shape)[0]
Br = np.indices(B.shape)[0]
isin = npi.in_((A.flatten(), Ar.flatten()), (B.flatten(), Br.flatten())).reshape(A.shape)

All functions in the numpy_indexed package operate equally on ndarrays, or in this case, tuples of ndarrays, which in practice equates to 'zipping' the ndarrays in the tuple and acting on that, without incurring the overhead of doing so. So we check for inclusion in a 1d set, of each item zipped with its row index; so matches are only registered when both the row index and the numerical value coincide.

Divakars solution probably has a speed advantage, but both solutions should be of the same time-complexity. And the solution posted here works with arbitrary dtypes; or even if the tokes you are trying to match are ndarrays themselves rather than scalars!

score 0 · Answer 3 · answered Sep 18 '18 at 10:19

0

Maybe something like this:

>A = np.zeros((5, 5))
>B = np.ones((5, 5))
>B[2, :] = 0
>print(A)
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
>print(B)
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
>(A == B).all(axis=1)
array([False, False,  True, False, False])

answered Sep 18 '18 at 10:19

cyberspace

191
1
2
11

How is this supposed to be the same as the code above? If I run OP's code, I get a completely different result... – norok2 Sep 18 '18 at 10:29
1

@cyberspace Do you understand what `np.isin` does? Can you read the docs? – Divakar Sep 18 '18 at 10:30

Fast way to check if elements in sub-dimension of a numpy array is in sub-dimension of another numpy array

3 Answers3