1

I have matrix 32x32 that is my sample and matrix of variable length that is input.

I had split sample to chunks of 4x4 numbers and need to search whole chunks in input matrix. Problem is that i don't want to compare every number in array with one position like numpy.isin does. What i need to do is mark all 4x4 occurrences in input matrix as True if all 4x4 values match with current chunk from sample. It doesn't matter how chunk is placed, only important thing is that every number in chunk will be equal in the input matrix area that we are comparing to.

Sample is 32x32 numbers every time ,but input matrix changes in size and can vary in size from 800x600 to 1920x1080

Example:

We have

Input Matrix:

1 1 2 3 7 9 4 2 9 7 9 7 
2 1 4 5 1 4 7 5 4 1 4 1 
0 1 0 0 3 0 1 1 0 3 0 3 
8 4 2 1 9 1 2 3 1 9 1 9 
4 2 4 2 7 9 7 9 7 9 4 2 
7 5 7 5 1 4 1 4 1 4 7 5
1 1 1 1 3 0 3 0 3 0 1 1
2 3 2 3 9 1 9 1 9 1 2 3

Sample Matrix Chunk 1:

1 1 2 3 
2 1 4 5 
0 1 0 0 
8 4 2 1

Sample Matrix Chunk 2:

2 3 7 9
4 5 1 4
0 0 3 0
2 1 9 1

Algorithm needs to mark all equal sub arrays to sample chunk in input matrix to True

Expected Result After comparing first chunk:

1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0

Expected Result After comparing second chunk:

0 0 1 1 1 1 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0

Found chunks need to be merged into one result such as

Expected Result:

1 1 1 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0 0 0
1 1 1 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0

I would like to solve this in most effective way and with using as little for loops as possible.

  • 1
    This question is not well defined. What is the exact mathematical definition of index (row, col) in the result matrix? How should the "sample" be placed on the "input matrix"? are there strides? are there limitations on size? Please make a rigorous definition and add more use cases, preferably on inputs as small and simple ass possible. – Gulzar Apr 25 '21 at 12:25
  • Does this answer your question? [Python/NumPy first occurrence of subarray](https://stackoverflow.com/questions/7100242/python-numpy-first-occurrence-of-subarray) – mcsoini Apr 25 '21 at 12:28

1 Answers1

1

I hope that I understand your task right: "To found mask of any match of chunks in input data".

import numpy as np

qd = lambda xd, rem: (xd - rem)
xvds = lambda qxd, sd, nfull: [qxd]*nfull + [qxd - sd]*(sd - nfull)
xvdr = lambda xvd0s, xvd1s: [((i, i + s0), (j, j + s1)) 
    for i, s0 in enumerate(xvd0s)
    for j, s1 in enumerate(xvd1s)]

view = lambda x, d0r, d1r, d0s, d1s, sd0, sd1: np.swapaxes(
    x[d0r[0]:d0r[1], d1r[0]:d1r[1]].reshape(
        1, d0s//sd0, sd0, d1s//sd1, sd1),
    axis1=-3, axis2=-2)

def match_chunks(
        x: np.ndarray,       # input data
        chunks: np.ndarray, # sequence of chunks
        ) -> np.ndarray:     # return mask of x
    xd0, xd1 = x.shape
    n_chunks, sd0, sd1 = chunks.shape
    #
    rem0, rem1 = xd0 % sd0, xd1 % sd1
    qxd0, qxd1 = qd(xd0, rem0), qd(xd1, rem1)
    xvd0s = xvds(qxd0, sd0, rem0 + 1)
    xvd1s = xvds(qxd1, sd1, rem1 + 1)
    # "sizes" and "ranges" of x_views:
    xvd01s = [(a, b) for a in xvd0s for b in xvd1s]
    xvd01r = xvdr(xvd0s, xvd1s)
    #
    chunks = chunks[:, None, None, ...]
    mask = np.zeros_like(x, dtype=np.bool)
    for (d0r, d1r), (d0s, d1s) in zip(xvd01r, xvd01s):
        xview = view(x, d0r, d1r, d0s, d1s, sd0, sd1)
        mview = view(mask, d0r, d1r, d0s, d1s, sd0, sd1)
        matched = np.all((xview == chunks), (-2, -1), keepdims=True)
        mview[...] |= np.any(matched, 0, keepdims=True)
    return mask.astype(x.dtype)

def test():
    x = np.random.randint(0, 10, size=(7, 13))
    print(x)
    prng = [(slice(1, 1+3), slice(1, 1+4)),
            (slice(3, 3+3), slice(4, 4+4)),
            (slice(2, 2+3), slice(9, 9+4))]

    chunks = np.array([x[pr] for pr in prng])

    mtgt = np.zeros_like(x)
    for pr in prng:
        mtgt[pr] = 1

    m = match_chunks(x, chunks)

    mdf = m - mtgt

    with open("test_match_output.txt", 'w') as fl:
        print(*[np.array_str(a, max_line_width=200) for a in [x, chunks, mtgt, m, mdf]], sep="\n\n", file=fl)

if __name__ == '__main__':
    test()

Output:

# x
[[5 9 0 3 6 5 6 4 2 8 6 3 2]
 [7 3 1 3 1 6 5 2 8 5 9 3 7]
 [2 5 5 0 2 8 8 0 0 7 8 8 6]
 [0 9 1 5 6 0 8 0 9 1 8 8 5]
 [5 7 4 8 4 0 9 8 3 9 4 7 0]
 [4 1 4 1 4 6 9 9 7 3 9 6 5]
 [5 6 3 4 2 5 0 2 8 1 3 6 6]]

# chunks
[[[3 1 3 1]
  [5 5 0 2]
  [9 1 5 6]]

 [[6 0 8 0]
  [4 0 9 8]
  [4 6 9 9]]

 [[7 8 8 6]
  [1 8 8 5]
  [9 4 7 0]]]

# mask_target
[[0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 1 1 1 0 0 0 0 0 0 0 0]
 [0 1 1 1 1 0 0 0 0 1 1 1 1]
 [0 1 1 1 1 1 1 1 0 1 1 1 1]
 [0 0 0 0 1 1 1 1 0 1 1 1 1]
 [0 0 0 0 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]]

# mask_computed
[[0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 1 1 1 1 0 0 0 0 0 0 0 0]
 [0 1 1 1 1 0 0 0 0 1 1 1 1]
 [0 1 1 1 1 1 1 1 0 1 1 1 1]
 [0 0 0 0 1 1 1 1 0 1 1 1 1]
 [0 0 0 0 1 1 1 1 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]]


# mask_target - mask_computed

[[0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0]]
Dharman
  • 30,962
  • 25
  • 85
  • 135