I am using Multiprocessing.Pool for parallel processing. My code halts when the input data is big (over 1GB). The pool size is the exactly the number of cores (12), and each subprocess only deals with a portion of the data. The code has the following structure.
pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=2)
args = []
for j, module in enumerate(list_modules):
args.append((module, ..))
list_answer = pool.map(delegate_buildup, args)
pool.close()
pool.join()
I wonder each process keeps the whole copy of the data ? My intention is to distribute a portion of the data to each subprocess. If it works as I intended, I think the memory usage may not be critical because my computer has 128 GB RAM.
Is it possible to monitor the memory usage of each subprocess ?