1

I have a large array with zeros and ones, array = [1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1]. How can I find matching patterns like [0, 0], [0, 1], [1, 0], [1, 1] in array.

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
  • Have you tried converting it as string and use string methods? – nikn8 Apr 08 '20 at 04:57
  • What is your desired output when array = [1, 0, 1, 0, 0] and pattern is [1, 0]? – Gilseung Ahn Apr 08 '20 at 05:28
  • @GilseungAhn I am trying to count how many times these patterns repeat in the array by processing the array two elements at a time. So in this case, I have four patterns, so I would have for each pattern a counter, like counter1, counter2, and so on. So every time the pattern appears, make the counter go up for one. – Latif Fetahaj Apr 08 '20 at 20:20

2 Answers2

1

You can use a convolution for that, e.g. numpy.convolve:

import numpy as np

data = np.array([1, 0, 1, 0, 0, 0, 0 ,1, 1, 0, 1, 1])

# this fixes the issue that some patterns look identical
# scores due to the multiplication with 0
# e.g. [1, 0, 1] and [1, 1, 1]
# we just replace the 0 by -1
data[data == 0] = -1


kernel = np.array([0, 0, 0, 1, 1, 0, 1, 1])

# same fix for kernel
kernel[kernel == 0] = -1

res = np.convolve(data,kernel, 'full')
print(res)
# >>> [-1  0 -1  2  1  2  5 -2 -2 -2 -2  0 -5 -2  5  0 -1  2  1]

res = np.convolve(data,kernel, 'same')
print(res)
# >>> [ 2  1  2  5 -2 -2 -2 -2  0 -5 -2  5]    

res = np.convolve(data,kernel, 'valid')
print(res)
# >>> [-2 -2 -2 -2  0]

The higher the result the better the match. In your case is should be equal to the number of ones in your pattern and the index can be found using np.argmax().

Look at the keyword mode (full, same, valid) and choose what is best for your case.

There is also scipy.signal.convolve, which might be faster if you are processing lots of data.

Joe
  • 6,758
  • 2
  • 26
  • 47
  • This solution has the same issue as @MykoloZotko had with his first. If you look for e.g. `[1, 0, 1]` you will have an equal "match" for `[1, 0, 1]` and `[1, 1, 1]`. – JohanL Apr 08 '20 at 07:45
  • I added a hacky fix for this to the answer. Replacing the zeros in the kernel and the data with -1 should fix it. – Joe Oct 26 '20 at 20:13
0

You can use this function to create a rolling window array:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)


arr = np.array([1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1])
pattern = np.array([1, 0, 1])

arr = rolling_window(arr, pattern.shape[0])
print(arr)

Output:

[[1 1 1]
 [1 1 0]
 [1 0 0]
 [0 0 0]
 [0 0 0]
 [0 0 1]
 [0 1 1]
 [1 1 0]
 [1 0 1]
 [0 1 1]]

Then you can look for matches:

(arr == pattern).all(axis=1)
# [False False False False False False False False  True False]

Alternatively, you can use the method rolling in pandas:

(pd.Series(arr).rolling(pattern.shape[0])
    .apply(lambda x: (x == pattern).all())
    .fillna(0).astype('bool'))
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
  • But here you would find a match in `[1,1,1]` as well. Thus, it is not trying to "match" zeros, but it just "ignores" them. – JohanL Apr 08 '20 at 06:33