Faster way to get all elements/indices from 2 arrays with condition depending on each other

Question

I got two binary images of the same size, both contain black blobs.

Essentially, I need to find the positions of all black pixels from both images that are within a specific distance from all black pixels from the other image. This is slow in the most obvious approach. Best case would be some implemented function in numpy.

The result should be two lists or arrays, one for each image, conatining the positions of black pixels that are in proximity of black pixels from the other image.
So when I choose one black pixel and crop the image, with the defined distance arround that pixel, it is garanteed that this exact crop, applied to the other image, also contains black pixels.

Example:

import numpy as np

distance = 2

img1 = np.array([[255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 0, 0, 0, 255, 255, 255, 255],
                 [255, 0, 0, 0, 255, 255, 255, 255],
                 [255, 0, 0, 0, 255, 255, 255, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255]]).astype('uint8')

img2 = np.array([[255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255],
                 [255, 255, 255, 255, 0, 0, 0, 255],
                 [255, 255, 255, 255, 0, 0, 0, 255],
                 [255, 255, 255, 255, 0, 0, 0, 255],
                 [255, 255, 255, 255, 255, 255, 255, 255]]).astype('uint8')

img1_row, img1_col = np.where(img1 == 0)
img2_row, img2_col = np.where(img2 == 0)

img1_positions = np.column_stack((img1_row, img1_col ))
img2_positions = np.column_stack((img2_row, img2_col ))

img1_pos_list = []
img2_pos_list = []

for img1_pos in img1_positions:
    for img2_pos in img2_positions:
        if abs(img1_pos[0] - img2_pos[0]) <= distance:
            if abs(img1_pos[1] - img2_pos[1]) <= distance:
                if not any((img1_pos == x).all() for x in img1_pos_list):
                    img1_pos_list.append(img1_pos )
                if not any((img2_pos == x).all() for x in img2_pos_list): 
                    img2_pos_list.append(img2_pos )

print(img1_pos_list)
# [[2, 2],[2, 3],[3, 2],[3, 3]]
print(img2_pos_list)
# [[4, 4],[4, 5],[5, 4],[5, 5]]

So img1_pos_list contains the 4 positions of the indices of black pixels from the bottom right corner of the blob, and img2_pos_list contains the indices of black pixels from the top left corner of the blob. These are the only black pixels of each image that are within the distance=2 to any black pixel from the other image.

Applying this to a dataset of 50000+ images, for images size 100x100 and bigger is too slow. Is there a faster way?

You should provide a fully reproducible example of input/expected output. — mozway, Aug 29 '22 at 09:29
this is a great question. Your technique is to send a scanning square across the pixels.... The major gain would be in the `for loop`, so the first obvious gain might be a `list comprehension` (possibly be faster in some cases). Or possibly a matrix operation. — D.L, Aug 29 '22 at 10:07
EDIT: fully reproducible example provided. My knowledge of matrix operations are not advanced enough for this kind of problem, but I think this would be the start for a faster solution too. — TheTelefone, Aug 29 '22 at 10:24

Vladimir Fokow · Answer 1 · 2022-08-29T16:27:05.297

Scipy provides useful functions which make the solution easy.

You can use one of the answers to this question to construct a proximity_mask. I have chosen scipy.ndimage.binary_dilation, but you could also use scipy.signal.convolve2d, as described there.
Use this proximity_mask to invalidate all the elements of another array which are outside of this mask.
And finally, get the indices of the remaining elements (that have NOT been invalidated) using np.where:

from scipy import ndimage

def are_in_proximity(img1, img2, n):
    """
    Return the index positions of black pixels of `img1`, which are 
    in proximity of black pixels of `img2` + `n` pixels.
    """
    blacks_1 = img1 == 0
    blacks_2 = img2 == 0

    expansion_kernel = np.ones((n*2 + 1, n*2 + 1))
    proximity_mask_2 = ndimage.binary_dilation(blacks_2, structure=expansion_kernel)

    blacks_1[~proximity_mask_2] = False

    return np.where(blacks_1)

inds_1 = np.column_stack(are_in_proximity(img1, img2, distance))

inds_2 = np.column_stack(are_in_proximity(img2, img1, distance))

Result:

print(inds_1)

[[2 2]
 [2 3]
 [3 2]
 [3 3]]

print(inds_2)

[[4 4]
 [4 5]
 [5 4]
 [5 5]]

Faster way to get all elements/indices from 2 arrays with condition depending on each other

1 Answers1