After reading some SO posts, I've come up with a way to use OpenCV
in Python3 with multiprocessing
. I recommend doing this on linux, because according to this post, spawned processes share memory with their parent as long as the content is not changed. Here's a minimal example:
import cv2
import multiprocessing as mp
import numpy as np
import psutil
img = cv2.imread('test.tiff', cv2.IMREAD_ANYDEPTH) # here I'm using a indexed 16-bit tiff as an example.
num_processes = 4
kernel_size = 11
tile_size = img.shape[0]/num_processes # Assuming img.shape[0] is divisible by 4 in this case
output = mp.Queue()
def mp_filter(x, output):
print(psutil.virtual_memory()) # monitor memory usage
output.put(x, cv2.GaussianBlur(img[img.shape[0]/num_processes*x:img.shape[0]/num_processes*(x+1), :],
(kernel_size, kernel_size), kernel_size/5))
# note that you actually have to process a slightly larger block and leave out the border.
if __name__ == 'main':
processes = [mp.Process(target=mp_filter, args=(x, output)) for x in range(num_processes)]
for p in processes:
p.start()
result = []
for ii in range(num_processes):
result.append(output.get(True))
for p in processes:
p.join()
Instead of using Queue
, another way to collect the result from the processes is to create a shared array through multiprocessing
module. (Has to import ctypes
)
result = mp.Array(ctypes.c_uint16, img.shape[0]*img.shape[1], lock = False)
Then each process can write to different portions of the array assuming there is no overlap. Creating a large mp.Array
is surprisingly slow, however. This actually defies the purpose of speeding up the operation. So use it only when the added time is not much when compared with total computation time. This array can be turned into a numpy array by :
result_np = np.frombuffer(result, dtypye=ctypes.c_uint16)