Count and find pattern in 2D array in python

Question

I have this data below:

data = np.array([[1, 0,-1, 0, 0, 1, 0,-1, 0, 0, 1],
                 [1, 1, 0, 0,-1, 0, 1, 0, 0,-1, 0],
                 [1, 0, 0, 1, 0, 0,-1, 0, 1, 0, 0],
                 [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0]])

i want to calculate how many 0 in each row and the count is for every 0 next to each other. the result I was hoping for was new array like this:

[[1 2 1 2]
 [2 1 2 1]
 [2 2 1 2]
 [2 5 2]]

and i want to search the 2 1 2 pattern ratio(also in every row) with some tolerance(if the number deviates slightly) and save the coordinate of 1 in the pattern.

so, I'll found 212, or 424, or 636, or 9 5 10(tolerated), etc

expected result:

[[0,6],[1,5],[2,7]]

those are the positions of every 1 in 212 pattern of data array

I've tried with this code below:

np.unique(data, return_counts=True, axis=1)

I fiddling with that and the result was not as I expected. This is used for image processing and the data was huge

If there are more than one `2 1 2` patterns in a row, how do you want those indices*captured*? If there are overlapping `2 1 2` (e.g. `2 1 2 1 2 1 2 1 2`) patterns in a row, how do you want those indices *captured*? — wwii, Apr 09 '20 at 18:07
Your example data has eleven columns - does the *real* data have only eleven columns? — wwii, Apr 09 '20 at 22:47

wwii · Accepted Answer · 2020-04-09T19:04:17.623

data = np.array([[1, 0,-1, 0, 0, 1, 0,-1, 0, 0, 1],
                 [1, 1, 0, 0,-1, 0, 1, 0, 0,-1, 0],
                 [1, 0, 0, 1, 0, 0,-1, 0, 1, 0, 0],
                 [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0]])
a = data

Counting consecutive zeros in each row:
Numpy and Python loop(s).
Iterate over rows; find the indices of the zeros; split the row where indices differ by more than one; get the shapes of the result.

for row in a:
    zeros = np.where(row==0)[0]
    neighbors = (np.argwhere(np.diff(zeros)>1)+1).ravel()
    w = np.split(zeros,neighbors)
    counts = [thing.shape[0] for thing in w]
    print(counts)

Pattern indices:
Uses some broadcasting - operates on all rows at once while iterating on columns

# pattern to search for:
# notzero,zero,zero,notzero,zero,notzero,zero,zero,notzero
pattern = np.array([False,True,True,False,True,False,True,True,False])    

# find zeros in data and pad
padded = np.pad(a==0,1)
dif = padded.shape[1] - pattern.shape[0]
for i in range(dif+1):
    stop = i+pattern.shape[0]
    test = padded[:,i:stop]
    equal =  test == pattern
    equal = np.all(equal,1)
    if any(equal):
        row = np.argwhere(equal).ravel()[0]
        print(f'[{row-1},{i+3}]')

This should find multiple (separated and overlapping) patterns in a row - seems to work with:

data = np.array([[1, 0,-1, 0, 0, 1, 0,-1, 0, 0, 1, 0,-1, 0, 0, 1,-1, 0, 0, 1, 0,-1, 0, 0],
                 [1, 1, 0, 0,-1, 0, 1, 0, 0,-1, 0, 1, 0, 0,-1, 0, 0, 0,-1, 0, 1, 0, 0,-1],
                 [1, 0, 0, 1, 0, 0,-1, 0, 1, 0, 0,-1, 0, 1, 0, 0, 0, 1, 0, 0,-1, 0, 1, 0],
                 [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0]])

thanks, i think it did not work well for a huge amount of data. and how stupidly i am, i forgot about ratio, i want to find `212` pattern ratio (with some tolerance), not an exact pattern. thanks anyway. I just got another way to the true purpose of this — ircham, Apr 09 '20 at 20:30
@ircham - please post sour solution as an answer. [Can I answer my own question?](https://stackoverflow.com/help/self-answer) — wwii, Apr 09 '20 at 22:20
I think it's not a solution for this basic question. The real purpose is I want to detect QR Code position detection patterns in images. and this question posted because I did it from the basics. then I found another way by not processing array data from an image — ircham, Apr 10 '20 at 01:51

FBruzzesi · Answer 2 · 2020-04-09T15:24:26.083

Adapting the @jezrael answer from cumsum with reset, and assuming you can add the pandas dependency:

import pandas as pd
import numpy as np

data = np.array([[1, 0,-1, 0, 0, 1, 0,-1, 0, 0, 1],
                 [1, 1, 0, 0,-1, 0, 1, 0, 0,-1, 0],
                 [1, 0, 0, 1, 0, 0,-1, 0, 1, 0, 0],
                 [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0]])

df = pd.DataFrame(data.T, columns=range(data.shape[0]))
a = (df == 0)
df = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)

# Add a last row of zeros 
df.loc[len(df)] = 0

# Define custom function to apply column-wise
def find_pattern(col):
    c = col.to_numpy()
    ids = np.argwhere(c==0) - 1 
    ids = ids[ids>=0]
    return [x for x in c[ids] if x!=0]

r = df.apply(lambda col: find_pattern(col), axis=0)

r
0    [1, 2, 1, 2]
1    [2, 1, 2, 1]
2    [2, 2, 1, 2]
3       [2, 5, 2]
dtype: object

The result r is a pandas Series indexed by row index, and the expected output as values.

Finally to find the [2,1,2] pattern, you can again use pandas functionalities:

r = pd.DataFrame(r, columns=['zeros'])
r['string_col'] = r['zeros'].apply(lambda row: ''.join([str(x) for x in row]))

pattern_as_string = '212'
r['pattern_index'] = r['string_col'].str.find(pattern_as_string)

         zeros  string_col  pattern_index
0  [1, 2, 1, 2]       1212              1
1  [2, 1, 2, 1]       2121              0
2  [2, 2, 1, 2]       2212              1
3     [2, 5, 2]        252             -1

Where pattern_index is the value at which the pattern starts, and it's -1 if not found.

it works to calculate the `0`. But still confused about how to look for `2 1 2` patterns and their position in the first array. And I just found out about Pandas — ircham, Apr 09 '20 at 15:16
@ircham edited my answer to add the pattern search. Hope it helps! — FBruzzesi, Apr 09 '20 at 15:25
but what I need is the pattern index of the first array (`data`), not the new array @FBruzzesi — ircham, Apr 09 '20 at 15:36
Then I am not sure what your expected output is since there are other values in between. Please provide what you expect as result. — FBruzzesi, Apr 09 '20 at 15:45

Count and find pattern in 2D array in python

2 Answers2