Can I find out if one numpy vector appears as a slice of another?

Question

I want to find out if my numpy vector, needle, appears inside another vector, haystack, as a slice, or contiguous sub-vector.

I want a function find(needle, haystack) that returns true if and only if there are possible integer indexes p and q such that needle equals haystack[p:q], where "equals" means elements are equal at all positions.

Example:

find([2,3,4], [1,2,3,4,5]) == True
find([2,4], [1,2,3,4,5]) == False  # not contiguous inside haystack
find([2,3,4], [0,1,2,3]) == False  # incomplete

Here I am using lists to simplify the illustration, but really they would be numpy vectors (1-dimensional arrays).

For strings in Python, the equivalent operation is trivial: it's in: "bcd" in "abcde" == True.

An appendix on dimensionality.

Dear reader, you might be tempted by similar looking questions, such as testing whether a Numpy array contains a given row, or Checking if a NumPy array contains another array. But we can dismiss this similarity as not being helpful by a consideration of dimensions.

A vector is a one-dimensional array. In numpy terms a vector of length N will have .shape == (N,); its shape has length 1.

The other referenced questions are, generally seeking to find an exact match for a row in a matrix that is 2-dimensional.

I am seeking to slide my 1-dimensional needle along the same axis of my 1-dimensional haystack like a window, until the entire needle matches the portion of the haystack that is visible through the window.

Does this answer your question? [Checking if a NumPy array contains another array](https://stackoverflow.com/questions/33217660/checking-if-a-numpy-array-contains-another-array) — Ignacio Vergara Kausel, Jan 31 '20 at 09:50
If you are testing multiple "needles" against a single "haystack", [Burrows Wheeler transform](https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform) can be helpful. Looking at DNA alignment implementations might also be helpful in this case. — hilberts_drinking_problem, Jan 31 '20 at 09:57

score 1 · Answer 1 · answered Feb 01 '20 at 01:07

1

If you are fine with creating copies of the two arrays, you could fall back on Python in operator for byte objects:

def find(a, b):
  return a.tobytes() in b.tobytes()

print(
    find(np.array([2,3,4]), np.array([1,2,3,4,5])),
    find(np.array([2,4]),   np.array([1,2,3,4,5])),
    find(np.array([2,3,4]), np.array([0,1,2,3])),
    find(np.array([2,3,4]), np.array([0,1,2,3,4,5,2,3,4])),
)

# True False False True

answered Feb 01 '20 at 01:07

hilberts_drinking_problem

11,322
3
22
51

This is extremely witty. – David Jones Feb 01 '20 at 21:54

Grzegorz Skibinski · Answer 2 · 2020-02-01T09:29:49.513

0

Try with list comprehension:

def find(a,x):
    return any([x[i:i+len(a)]==a for i in range(1+len(x)-len(a))])

Outputs:

print(find([2,3,4], [1,2,3,4,5]),
find([2,4], [1,2,3,4,5]),
find([2,3,4], [0,1,2,3]), find([2,3,4], [0,1,2,3,4,5,2,3,4]))
>> True False False True

edited Feb 01 '20 at 09:29

answered Jan 31 '20 at 13:03

Grzegorz Skibinski

12,624
2
11
34

yes, this is the approach i went with. I was rather hoping that there would be an existing function that I had not found. – David Jones Feb 17 '20 at 11:20

Can I find out if one numpy vector appears as a slice of another?

2 Answers2

Linked