As in the title, I'm struggling with memory leak when using multiprocessing
. I know the question like this has been asked before, but I still cannot find the right solution for my problem.
I have a list of RGB images (30.000
total). I need to read each image, process all three RGB channels, then keep the result in the memory (to be saved in 1
big file later)
I'm trying to use something like this:
import multiprocessing as mp
import random
import numpy as np
# Define an output queue to store result
output = mp.Queue()
# define a example function
def read_and_process_image(id, output):
result = np.random.randint(256, size=(100, 100, 3)) #fake an image
output.put(result)
# Setup a list of processes that we want to run
processes = [mp.Process(target=read_and_process_image, args=(id, output)) for id in range(30000)]
# Run processes
for p in processes:
p.start()
# # Exit the completed processes
# for p in processes:
# p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
This code uses a lot of memory. This answer explained the problem, but I cannot find the way to apply it to my code. Any suggestion? Thanks!
Edit: I also try joblib
and the Pool
class, but the code won't use all the cores like I expected (I see no difference between using normal for
loop with these 2 cases)