1

I want to find out if my numpy vector, needle, appears inside another vector, haystack, as a slice, or contiguous sub-vector.

I want a function find(needle, haystack) that returns true if and only if there are possible integer indexes p and q such that needle equals haystack[p:q], where "equals" means elements are equal at all positions.

Example:

find([2,3,4], [1,2,3,4,5]) == True
find([2,4], [1,2,3,4,5]) == False  # not contiguous inside haystack
find([2,3,4], [0,1,2,3]) == False  # incomplete

Here I am using lists to simplify the illustration, but really they would be numpy vectors (1-dimensional arrays).

For strings in Python, the equivalent operation is trivial: it's in: "bcd" in "abcde" == True.


An appendix on dimensionality.

Dear reader, you might be tempted by similar looking questions, such as testing whether a Numpy array contains a given row, or Checking if a NumPy array contains another array. But we can dismiss this similarity as not being helpful by a consideration of dimensions.

A vector is a one-dimensional array. In numpy terms a vector of length N will have .shape == (N,); its shape has length 1.

The other referenced questions are, generally seeking to find an exact match for a row in a matrix that is 2-dimensional.

I am seeking to slide my 1-dimensional needle along the same axis of my 1-dimensional haystack like a window, until the entire needle matches the portion of the haystack that is visible through the window.

David Jones
  • 4,766
  • 3
  • 32
  • 45

2 Answers2

1

If you are fine with creating copies of the two arrays, you could fall back on Python in operator for byte objects:

def find(a, b):
  return a.tobytes() in b.tobytes()

print(
    find(np.array([2,3,4]), np.array([1,2,3,4,5])),
    find(np.array([2,4]),   np.array([1,2,3,4,5])),
    find(np.array([2,3,4]), np.array([0,1,2,3])),
    find(np.array([2,3,4]), np.array([0,1,2,3,4,5,2,3,4])),
)

# True False False True
hilberts_drinking_problem
  • 11,322
  • 3
  • 22
  • 51
0

Try with list comprehension:

def find(a,x):
    return any([x[i:i+len(a)]==a for i in range(1+len(x)-len(a))])

Outputs:

print(find([2,3,4], [1,2,3,4,5]),
find([2,4], [1,2,3,4,5]),
find([2,3,4], [0,1,2,3]), find([2,3,4], [0,1,2,3,4,5,2,3,4]))
>> True False False True
Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
  • yes, this is the approach i went with. I was rather hoping that there would be an existing function that I had not found. – David Jones Feb 17 '20 at 11:20