4

I want to verify if a numpy array is a continuous sequence in another array.

E.g.

a = np.array([1,2,3,4,5,6,7])
b = np.array([3,4,5])
c = np.array([2,3,4,6])

The expected result would be:

is_sequence_of(b, a) # should return True
is_sequence_of(c, a) # should return False

I want to know if there is a numpy method that does this.

cristian hantig
  • 1,110
  • 1
  • 12
  • 16
  • related? https://stackoverflow.com/questions/56840763/how-to-check-if-a-numpy-array-is-a-subarray-of-another-bigger-array – FObersteiner Nov 08 '19 at 12:58

1 Answers1

4

Approach #1

We can use one with np.searchsorted -

def isin_seq(a,b):
    # Look for the presence of b in a, while keeping the sequence
    sidx = a.argsort()
    idx = np.searchsorted(a,b,sorter=sidx)
    idx[idx==len(a)] = 0
    ssidx = sidx[idx]
    return (np.diff(ssidx)==1).all() & (a[ssidx]==b).all()

Note that this assumes that the input arrays have no duplicates.

Sample runs -

In [42]: isin_seq(a,b) # search for the sequence b in a
Out[42]: True

In [43]: isin_seq(c,b) # search for the sequence b in c
Out[43]: False

Approach #2

Another with skimage.util.view_as_windows -

from skimage.util import view_as_windows

def isin_seq_v2(a,b):
    return (view_as_windows(a,len(b))==b).all(1).any()

Approach #3

This could also be considered as a template-matching problem and hence, for int numbers, we can use OpenCV's built-in function for template-matching : cv2.matchTemplate (inspired by this post), like so -

import cv2 
from cv2 import matchTemplate as cv2m

def isin_seq_v3(arr,seq):
    S = cv2m(arr.astype('uint8'),seq.astype('uint8'),cv2.TM_SQDIFF)
    return np.isclose(S,0).any()

Approach #4

Our methods could benefit with a short-circuiting based one. So, we will use one with numba for performance-efficiency, like so -

from numba import njit

@njit
def isin_seq_numba(a,b):
    m = len(a)
    n = len(b)
    for i in range(m-n+1):
        for j in range(n):
            if a[i+j]!=b[j]:
                break                
        if j==n-1:
            return True
    return False
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • This tests something different from what the question is asking for. – user2357112 Nov 08 '19 at 11:03
  • The new versions don't work either. Among other problems, they ignore the order of `b`. – user2357112 Nov 08 '19 at 11:13
  • [Here's a demo of `isin_seq_v2` not working.](https://ideone.com/6pHBD2) I would show a demo for `isin_seq`, but Ideone doesn't have skimage. – user2357112 Nov 08 '19 at 11:15
  • [The new version of `isin_seq` doesn't handle duplicates.](https://ideone.com/b9569w) The new `isin_seq_v2` should work, though. – user2357112 Nov 08 '19 at 12:01
  • The note isn't quite correct, since problems occur [if the haystack array contains duplicates](https://ideone.com/T1Sulw), not just if the needle contains duplicates. (Also, I didn't realize you'd made the haystack the first argument instead of the second, so my previous examples have the arguments backward.) – user2357112 Nov 09 '19 at 01:18
  • @user2357112 Ah yeah, that's true. Edited. – Divakar Nov 09 '19 at 06:08