Replace Parts of an image (in a fast way) Numpy

Question

I'm trying to find a fast way to look up white space (of certain size, let's say 10x10 pixels) in an RGB image loaded into a 3D numpy array (and replace by red). The array is 6300 x 3600 x 3. I used below code but it takes about 3-3.5 minutes in total. Of course there are libraries for image manipulation but I am just frustrated that numpy cannot be faster. I'm not a numpy expert so I must be doing something wrong. All suggestions are welcome! :-)

def find_white_spaces(w, h):
    x = 0
    img = <loaded from local location, shape 6300, 3600, 3 [RGB]>
    w_img = img.shape[1]    # Number of columns - horizontal pixels
    h_img = img.shape[0]    # Number of rows - vertical pixels
    white_space = np.full((h, w, 3), 255)
    for y in range(h_img-h):
        x = 0
        while x < (w_img-w):
            img_slice = img[y:y+h, x:x+w]
            if np.array_equal(img[y:y+h, x:x+w], white_space):
                img[y:y+h, x:x+w] = [255, 0, 0] # replace by red
                x += w
            else:
                x += 1
    show_img = Image.fromarray(img)
    display(show_img)

Notice that your algorithm does something weird for an 11 by 11 white square, leaving a white border — azelcer, Jan 26 '22 at 00:41

score 0 · Answer 1 · answered Jan 26 '22 at 01:40

The main issue is that Numpy functions are not optimized for working on small arrays. They introduces a pretty big overhead in such cases mainly due to the argument parsing, (quite inefficient) checks, internal function calls, etc. For example np.array_equal on a 10x10 array takes about 5 us on my machine while the actual computation time should be smaller an 50 ns (100 times faster). Generally, the solution is to vectorize your code but this is not easy here to do that efficiently. You can solve this problem using Numba.

Another important issue comes from the algorithm: it is not efficient. Indeed, when the analysed slice is not white, you increment x by 1 and the next iteration may reanalysed most pixels already analysed just before. You can adjust the increment based on the content of the slice. For example if the last column of the slice is not white, then you can skip the whole block!

Moreover, the white_space array is of type np.int_ by default which is inefficient (since it is 4~8 bigger in memory and np.uint8 and cause unnecessary internal conversions). In fact you do not even need this array.

Here is an untested example code using Numba to solve your problem much more efficiently:

import numba as nb

@nb.njit('void(uint8[:,:,::1], int_, int_)')
def compute(img, h, w):
    w_img = img.shape[1]    # Number of columns - horizontal pixels
    h_img = img.shape[0]    # Number of rows - vertical pixels
    red = np.array([255, 0, 0], dtype=np.uint8)

    for y in range(h_img-h):
        x = 0
        while x < w_img-w:
            fail = False

            # Find the location of the last pixel on the right that is not white
            for y2 in range(y, y+h):
                # Start from the right so to be faster
                for x2 in range(x+w-1, x-1, -1):
                    r = img[y2, x2, 0]
                    g = img[y2, x2, 1]
                    b = img[y2, x2, 2]
                    if r != 255 or g != 255 or b != 255:
                        # Skip many pixels
                        x = x2+1
                        fail = True
                        break
                if fail:
                    break

            if not fail:
                # Replace by red
                img[y:y+h, x:x+w] = red
                x += w

def find_white_spaces(w, h):
    x = 0
    img = <loaded from local location, shape 6300, 3600, 3 [RGB]>
    compute(img, h, w)
    show_img = Image.fromarray(img)
    display(show_img)

score 0 · Answer 2 · answered Feb 02 '22 at 00:04

I agree to the points presented by @Jérôme Richard that the algorithm is inefficient and that using np.uint8 is enough for the suggested algorithm. Nevertheless, I think that the problem is not that the array you look for is small, but that you are looping in plain python. The joy of numpy is how it takes care of looping through the arrays in an efficient way. You are using C-style looping which is slow in python. You perform many calls to a function and, yes: it performs checks every time. But the problem is not the size of the array but algorithm and the fact that you loop in plain python.

I ran some tests using a 900 by 600 image. Using np.uint8 will result in around 10% less running time. A slighlty improved algorithm first stores all the places where the image should be changed and then changes it in a second step:

def find_white_spaces_two_steps(img, w, h):
    w_img = img.shape[1]    # Number of columns - horizontal pixels
    h_img = img.shape[0]    # Number of rows - vertical pixels
    white_space = np.full((h, w, 3), 255, dtype=np.uint8)
    spots = np.zeros((h_img, w_img,), dtype=bool)
    for y in range(h_img-h):
        x = 0
        while x < (w_img-w):
            if np.array_equal(img[y:y+h, x:x+w], white_space):
                spots[y, x] = True
                x += w-1
            x += 1
    red = np.array([255, 0, 0], dtype=np.uint8)
    for y, x in zip(*np.nonzero(spots)):
        img[y: y+h, x: x+w] = red  # replace by red

this leads to a slightly better improvement of 20%, and correctly fills larger areas: note that if the original algorithm finds a 11 by 11 area, it will fill the first 10 by 10 block, leaving a white line around.

You can check the maximum speed of your algorithm using Numba: it will make thinks go much faster, even with boundary checking. You can decorate your function with @njit(boundscheck=False) and @njit(boundscheck=True), and you will see that the speed in both cases is almost the same:

@njit(boundscheck=True)
def find_white_spaces_check(img, w, h):
    x = 0
    w_img = img.shape[1]    # Number of columns - horizontal pixels
    h_img = img.shape[0]    # Number of rows - vertical pixels
    white_space = np.full((h, w, 3), 255, dtype=np.uint8)
    red = np.array([255, 0, 0], dtype=np.uint8)
    for y in range(h_img-h):
        x = 0
        while x < (w_img-w):
            if np.array_equal(img[y:y+h, x:x+w], white_space):
                img[y:y+h, x:x+w] = red # replace by red
                x += w
            else:
                x += 1

In my tests this leads to a 80% reduction of the running time.

You can try better searching algorithms, like the ones suggested in the answers to this question. See at least this and this one, but there are others that are interesting.

The best algorithm I could think of using only numpy tools involves masking the desired color with 0:

img_int = img - np.array([255, 255, 255], dtype=float) 
pixels = np.square(img_int).sum(axis=2)

Then, finding the coordinates of the regions completely filled with 0. I used a rolling sum as implemented in this answer. Notice that a convolution should be faster for small regions, and that I use two consecutive calls instead of implementing a 2D cumsum:

spots = rolling_sum(pixels, w)
spots = (rolling_sum(spots.T, h)).T

and finally replacing the values in the image:

red = np.array([255, 0, 0], dtype=np.uint8)
for y, x in zip(*np.nonzero(spots == 0)):
    img[y: y+h, x: x+w] = red  # replace by red

I guess this last part is the one that should be improved, but I can't think of a better way now. Maybe it is possible to use stride_tricks, but I have never gone that way (yet).

The full function is:

def rolling_sum(a, n):
    ret = np.cumsum(a, axis=1, dtype=float)
    ret[:, n:] = ret[:, n:] - ret[:, :-n]
    return ret[:, n - 1:]

def find_white_spaces_fast(img, w, h):
    # Select the pixels of the right color. Make a mask of 0
    img_int = img - np.array([255, 255, 255], dtype=float)
    pixels = np.square(img_int).sum(axis=2)  # filled with zeros
    # With a rolling sum, we select the coords where the x by h squares are 0
    spots = rolling_sum(pixels, w)
    spots = (rolling_sum(spots.T, h)).T
    red = np.array([255, 0, 0], dtype=np.uint8)
    for y, x in zip(*np.nonzero(spots == 0)):
        img[y: y+h, x: x+w] = red  # replace by red

With this algorithm the reduction in running time is of 97.7%.

Replace Parts of an image (in a fast way) Numpy

2 Answers2