0

I'm using OpenCV with Python and I'm trying to use multiprocess to process an image. The image is 100x100 and I have started 4 processes. The entire image was split in four. This is one process:

processes = []
_ = mp.Process(target=kill, args=[blue[0:40, 0:47],0, 2])
_.start()
processes.append(_)

After this, I just join all of the processes.

This is my function:

def kill(sliced, idx, perc):
    for i in range (0, sliced.shape[0]):
        for j in range (0, sliced.shape[1]):
            if perc* sliced[i][j][0] - sliced[i][j][1] - sliced[i][j][2] < 0:
                for k in range(0, 3):
                    sliced[i][j][k] = 0  #I am expecting this to alter my "blue" image

So I was expecting next, if I were to cv2.imshow("blue", blue) to see an image with blacked out pixels. The problem is that this seems not to modify the original blue image.

I am passing to each of my process a sliced image. After the processes being finished, I was expecting my original image to be modified, instead it wasn't altered. Passing a sliced image and modifying it shouldn't modify my original image? Is there any copy / buffer thing?

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
Cătălina Sîrbu
  • 1,253
  • 9
  • 30
  • I don't know what type `blue` is, but slicing probably makes a copy. Another, bigger problem is that you're using multi-processing, and different processes have completely separate memory spaces. You cannot modify the original memory like that. See for example https://stackoverflow.com/q/10415028/1983772, this might help. – Norrius Jul 12 '20 at 14:05
  • `blue` is a numpy array, a BGR image. The problem is that I don't know how can I share a numpy array because `multiprocess.Value` only accept built in data types – Cătălina Sîrbu Jul 12 '20 at 14:09
  • 2
    Why would you use multiprocess for such a trivial operation? If you vectorize it, it will run in microseconds. – Cris Luengo Jul 12 '20 at 14:18
  • i'm not sure I know what are you talking about . Could you please share ? – Cătălina Sîrbu Jul 12 '20 at 14:19
  • @CrisLuengo could you please give more details? – Cătălina Sîrbu Jul 12 '20 at 15:01

1 Answers1

2

Multiprocessing is IMO really not suited for things like this, it is best used to process multiple independent images at once. It spawns separate processes, meaning there is no shared memory and data needs to be "sent" from one process to another, which incurs overhead. So it is best to use it for completely independent processing operations. Additionally, spawning processes takes time, which is not justified by the very simple operation you're implementing.

You can implement your operation without loops, obtaining a very import speed improvement (at least 2 orders of magnitude). Hopefully this makes it unnecessary to use multiple cores.

Assuming image is a NumPy array read in through OpenCV:

mask = perc * image[:,:,0] - image[:,:,1] - image[:,:,2] < 0
image[np.broadcast_to(mask[:,:,np.newaxis], image.shape)] = 0

The second line is rather complicated because mask is a 2D matrix, but image is a 3D matrix. So we need to extend mask to be 3D and of the same size as image by replicating it along a new, third dimension. mask[:,:,np.newaxis] is a 3D version of the 2D matrix, adding a 3rd dimension of size 1. np.broadcast_to() then replicates the dimensions of size 1 to the requested shape, image.shape. This extended mask can now be used to index into image. By indexing using a mask (a Boolean matrix), we select only those matrix elements of image where mask is True. The assignment thus only changes the selected pixels.


Another important tool to speed up loops in Python with NumPy is Numba. If an operation cannot be easily vectorized, that is the approach to try.

Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • it doesn't work. `image = image [mask[:,:,np.newaxis]] = 0 TypeError: 'int' object does not support item assignment` – Cătălina Sîrbu Jul 12 '20 at 15:55
  • @CătălinaSîrbu: sorry, typo. Fixed. – Cris Luengo Jul 12 '20 at 16:13
  • It's still not wroking: `image[mask[:,:,np.newaxis]] = 0 IndexError: boolean index did not match indexed array along dimension 2; dimension is 3 but corresponding boolean dimension is 1` – Cătălina Sîrbu Jul 12 '20 at 18:13
  • @CătălinaSîrbu: You are right! I thought that should work, but apparently for indexing you must explicitly broadcast... I have added the necessary step. – Cris Luengo Jul 12 '20 at 23:19
  • Whoa! This work amazing! It's unbelievable! I would really enjoy to understand exactly what have you done here. I understand somehow the first line even though I don't know how its possible that you pass an entire matrix and he applies for each pixel the wanted formula. The second line on the other hand is a mystery for me. Could you please tell me more about this or share with me a tutorial for a beginner? Thanks a lot! You rock! – Cătălina Sîrbu Jul 13 '20 at 06:44
  • @CătălinaSîrbu: NumPy can apply operations to whole matrices at once. This is what is referred to as "vectorized operations". It is not only faster, but the code is better readable. That second line is rather obscure, though. I agree. I added an explanation, I hope it helps in understanding the code. – Cris Luengo Jul 13 '20 at 06:52
  • 1
    @CătălinaSîrbu: I found these tutorials you might want to read. I looked over them quickly and they seem reasonable, but I can't vouch for their correctness. https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html and https://realpython.com/numpy-array-programming/ . Good luck in your learning journey! – Cris Luengo Jul 13 '20 at 06:55
  • Thank you very much. They are exactly what I was searching for! Good luck you too! – Cătălina Sîrbu Jul 15 '20 at 11:54