Faster way to iterate through pixel using numpy with conditions?

Question

def colorize(im, h, s, l_adjust):
    result = Image.new('RGBA', im.size)
    pixin = np.copy(im)
    pixout = np.array(result)

    >>>>>>>>>>>>>>>>> loop <<<<<<<<<<<<<<<<<

    for y in range(pixout.shape[1]):
        for x in range(pixout.shape[0]):
            lum = currentRGB(pixin[x, y][0], pixin[x, y][1], pixin[x, y][2])
            r, g, b = colorsys.hls_to_rgb(h, lum, s)
            r, g, b = int(r * 255.99), int(g * 255.99), int(b * 255.99)
            pixout[x, y] = (r, g, b, 255)

    >>>>>>>>>>>>>>>>>>>>> Loop end <<<<<<<<<<<

    return result

Trying to find the HSL per pixel value from a frame of input video but it's taking too much time about 1.5s but want to reduce the time to at least within 0.3s. Any faster way to do this without using these 2 loops? Looking for something like LUT(Look up table)/vectorize/something with NumPy shortcut to avoid those 2 loops. Thanks

OR

Part 2 ->>

If I break the custom currentRGB() into the for loops it looks like :

def colorize(im, h, s, l_adjust):
    result = Image.new('RGBA', im.size)
    pixin = np.copy(im)
    pixout = np.array(result)
    for y in range(pixout.shape[1]):
        for x in range(pixout.shape[0]):
            currentR, currentG, currentB = pixin[x, y][0]/255 , pixin[x, y][1]/255, pixin[x, y][2]/255
            #luminance
            lum = (currentR * 0.2126) + (currentG * 0.7152) + (currentB * 0.0722)
            if l_adjust > 0:
                lum = lum * (1 - l_adjust)
                lum = lum + (1.0 - (1.0 - l_adjust))
            else:
                lum = lum * (l_adjust + 1)
            l = lum
            r, g, b = colorsys.hls_to_rgb(h, l, s)
            r, g, b = int(r * 255.99), int(g * 255.99), int(b * 255.99)
            pixout[x, y] = (r, g, b, 255)
    return pixout

There are many libraries that implement this conversion. OpenCV, DIPlib, Scikit-Image, … Don’t reinvent the wheel, especially in Python where it is so easy to write really slow wheels. :) — Cris Luengo, Jul 14 '21 at 13:12
@ CrisLuengo Thanks for your reply but as I have a custom function inside the loop named currentRGB( ) so I cant apply -> cv.cvtColor(im, cv2.COLOR_RGB2HLS) this type of conversion. I would appreciate your further suggestions. — MSI, Jul 14 '21 at 13:24
I don’t know what that function does, nor the other functions you call in this loop. So there is no way for me to help in speeding up this loop. See [mre]. — Cris Luengo, Jul 14 '21 at 13:28
This is one of the biggest issue: if the function comes from an external module, you cannot vectorize it nor JIT it and it will be slow because function calls are expensive in CPython, especially when executed in 2 nested loops. I think you need to find another function to do that or do it yourself or even find a way to pass numpy arrays to this function. — Jérôme Richard, Jul 14 '21 at 13:28
I have added part 2 of my question. Where edited the process together inside the loop. Any idea or suggestion now? Its taking too much time :( about 1.5s but for processing my video frame by frame its too much !! — MSI, Jul 14 '21 at 13:39
All of that is easy to vectorize, but `colorsys.hls_to_rgb` is not, it accepts only scalars I think? If it doesn’t work with matrices, you will need to either write your own, or use one of the libraries I indicated above for this step. — Cris Luengo, Jul 14 '21 at 14:20
When I say “easy to vectorize”, I mean just replace the operations on individual pixels with identical operations on the full code. Just remove the loops and the indexing, convert the image to a NumPy array, and you’re left with nearly identically-looking code that is 100x as fast. — Cris Luengo, Jul 14 '21 at 14:23

score 1 · Accepted Answer · answered Jul 14 '21 at 16:13

You can use Numba to drastically speed the computation up. Here is the implementation:

import numba as nb

@nb.njit('float32(float32,float32,float32)')
def hue_to_rgb(p, q, t):
    if t < 0: t += 1
    if t > 1: t -= 1
    if t < 1./6: return p + (q - p) * 6 * t
    if t < 1./2: return q
    if t < 2./3: return p + (q - p) * (2./3 - t) * 6
    return p

@nb.njit('UniTuple(uint8,3)(float32,float32,float32)')
def hls_to_rgb(h, l, s):
    if s == 0:
        # achromatic
        r = g = b = l
    else:
        q = l * (1 + s) if l < 0.5 else l + s - l * s
        p = 2 * l - q
        r = hue_to_rgb(p, q, h + 1./3)
        g = hue_to_rgb(p, q, h)
        b = hue_to_rgb(p, q, h - 1./3)

    return (int(r * 255.99), int(g * 255.99), int(b * 255.99))

@nb.njit('void(uint8[:,:,::1],uint8[:,:,::1],float32,float32,float32)', parallel=True)
def colorize_numba(pixin, pixout, h, s, l_adjust):
    for x in nb.prange(pixout.shape[0]):
        for y in range(pixout.shape[1]):
            currentR, currentG, currentB = pixin[x, y, 0]/255 , pixin[x, y, 1]/255, pixin[x, y, 2]/255
            #luminance
            lum = (currentR * 0.2126) + (currentG * 0.7152) + (currentB * 0.0722)
            if l_adjust > 0:
                lum = lum * (1 - l_adjust)
                lum = lum + (1.0 - (1.0 - l_adjust))
            else:
                lum = lum * (l_adjust + 1)
            l = lum
            r, g, b = hls_to_rgb(h, l, s)
            pixout[x, y, 0] = r
            pixout[x, y, 1] = g
            pixout[x, y, 2] = b
            pixout[x, y, 3] = 255

def colorize(im, h, s, l_adjust):
    result = Image.new('RGBA', im.size)
    pixin = np.copy(im)
    pixout = np.array(result)
    colorize_numba(pixin, pixout, h, s, l_adjust)
    return pixout

This optimized parallel implementation is about 2000 times faster than the original code on my 6-core machine (on 800x600 images). The hls_to_rgb implementation is coming from this post. Note that the string in @nb.njit decorators are not mandatory but enable Numba to compile the function ahead of time instead of at the first call. For more information about the types, please read the Numba documentation.

This answer is a gem ! Thanks. I tried numba with cuda before but somehow return value was the issue there. But little query. Does `@cuda.jit` perform better than `@nb.njit` as I was following [this tutorial](https://github.com/noahgift/cloud-data-analysis-at-scale/blob/master/GPU_Programming.ipynb) . — MSI, Jul 14 '21 at 18:11
Using CUDA may help but this is hard to tell. Indeed, the image will need to be sent on the GPU memory, then computed and then transferred back to the CPU memory. Data transfers are often rather slow, limiting the speed up. The biggest problem with GPUs is that they are very different from CPU. So while the current code could run on a GPU, it will likely not be very efficient because of *warp divergence* and *coalescence*. Still, it may be faster in the end, so the best is to try. Note that if you want to target GPUs, I think `hls_to_rgb` if the fonction to optimize first. — Jérôme Richard, Jul 14 '21 at 19:48

Faster way to iterate through pixel using numpy with conditions?

1 Answers1