Is there a method in numpy that verifies if an array is contained in another array?

Question

I want to verify if a numpy array is a continuous sequence in another array.

E.g.

a = np.array([1,2,3,4,5,6,7])
b = np.array([3,4,5])
c = np.array([2,3,4,6])

The expected result would be:

is_sequence_of(b, a) # should return True
is_sequence_of(c, a) # should return False

I want to know if there is a numpy method that does this.

related? https://stackoverflow.com/questions/56840763/how-to-check-if-a-numpy-array-is-a-subarray-of-another-bigger-array — FObersteiner, Nov 08 '19 at 12:58

Divakar · Accepted Answer · 2019-11-09T06:07:30.033

4

Approach #1

We can use one with np.searchsorted -

def isin_seq(a,b):
    # Look for the presence of b in a, while keeping the sequence
    sidx = a.argsort()
    idx = np.searchsorted(a,b,sorter=sidx)
    idx[idx==len(a)] = 0
    ssidx = sidx[idx]
    return (np.diff(ssidx)==1).all() & (a[ssidx]==b).all()

Note that this assumes that the input arrays have no duplicates.

Sample runs -

In [42]: isin_seq(a,b) # search for the sequence b in a
Out[42]: True

In [43]: isin_seq(c,b) # search for the sequence b in c
Out[43]: False

Approach #2

Another with skimage.util.view_as_windows -

from skimage.util import view_as_windows

def isin_seq_v2(a,b):
    return (view_as_windows(a,len(b))==b).all(1).any()

Approach #3

This could also be considered as a template-matching problem and hence, for int numbers, we can use OpenCV's built-in function for template-matching : cv2.matchTemplate (inspired by this post), like so -

import cv2 
from cv2 import matchTemplate as cv2m

def isin_seq_v3(arr,seq):
    S = cv2m(arr.astype('uint8'),seq.astype('uint8'),cv2.TM_SQDIFF)
    return np.isclose(S,0).any()

Approach #4

Our methods could benefit with a short-circuiting based one. So, we will use one with numba for performance-efficiency, like so -

from numba import njit

@njit
def isin_seq_numba(a,b):
    m = len(a)
    n = len(b)
    for i in range(m-n+1):
        for j in range(n):
            if a[i+j]!=b[j]:
                break                
        if j==n-1:
            return True
    return False

edited Nov 09 '19 at 06:07

answered Nov 08 '19 at 11:02

Divakar

218,885
19
262
358

This tests something different from what the question is asking for. – user2357112 Nov 08 '19 at 11:03
The new versions don't work either. Among other problems, they ignore the order of `b`. – user2357112 Nov 08 '19 at 11:13
[Here's a demo of `isin_seq_v2` not working.](https://ideone.com/6pHBD2) I would show a demo for `isin_seq`, but Ideone doesn't have skimage. – user2357112 Nov 08 '19 at 11:15
[The new version of `isin_seq` doesn't handle duplicates.](https://ideone.com/b9569w) The new `isin_seq_v2` should work, though. – user2357112 Nov 08 '19 at 12:01
The note isn't quite correct, since problems occur [if the haystack array contains duplicates](https://ideone.com/T1Sulw), not just if the needle contains duplicates. (Also, I didn't realize you'd made the haystack the first argument instead of the second, so my previous examples have the arguments backward.) – user2357112 Nov 09 '19 at 01:18
@user2357112 Ah yeah, that's true. Edited. – Divakar Nov 09 '19 at 06:08

Is there a method in numpy that verifies if an array is contained in another array?

1 Answers1