1

I am using Multiprocessing.Pool for parallel processing. My code halts when the input data is big (over 1GB). The pool size is the exactly the number of cores (12), and each subprocess only deals with a portion of the data. The code has the following structure.

pool = multiprocessing.Pool(processes=pool_size, maxtasksperchild=2)
args = []
for j, module in enumerate(list_modules):
    args.append((module, ..))
list_answer = pool.map(delegate_buildup, args)
pool.close()
pool.join()

I wonder each process keeps the whole copy of the data ? My intention is to distribute a portion of the data to each subprocess. If it works as I intended, I think the memory usage may not be critical because my computer has 128 GB RAM.

Is it possible to monitor the memory usage of each subprocess ?

Yongsoo
  • 21
  • 2
  • Depends on the OS. In Linux the memory is shared with the child processes and only copied if a page is written to. – jordanm Jul 21 '17 at 06:25
  • I've been looking for a decent live visual profiler for Python for quite some time. Best I've found is http://www.pyvmmonitor.com/ which however is only free to use if you use it for open-source projects, otherwise they charge 100$ for a single license. (You do get a 15 day testing period) – orangeInk Jul 21 '17 at 06:44
  • I had quite a similar problem in that question https://stackoverflow.com/questions/45059703/simple-multitasking (no shared memory, but possible huge amounts of data). There are a few ways arround it, like only loading the data in the process pool, not before. Or using some tools to use shared memory like https://redis.io/ – Lucy The Brazen Jul 21 '17 at 06:52

0 Answers0