10

I haven't found a good way to monitor the memory usage of a Python script using multiprocessing. More specifically, say I do this:

import time

biglist = range(pow(10, 7))
time.sleep(5)

The memory usage is 1.3 GB, as measured by both /usr/bin/time -v and top. But now, say I do this:

import time
from multiprocessing import Pool

def worker(x):
    biglist = range(pow(10, 7))
    time.sleep(5)
    return

Pool(5).map(worker, range(5))

Now top reports 5 x 1.3 GB, which is correct. But /usr/bin/time -v still reports 1.3 GB, which doesn't make sense. If it is measuring the consumption of the parent process, then it should say 0. If it is measuring the parent and the children, then it should report 5 x 1.3 GB. Why does it say 1.3 GB? Now let's try copy-on-write:

import time
from multiprocessing import Pool

biglist = range(pow(10, 7))

def worker(x):
    time.sleep(5)
    return

Pool(5).map(worker, range(5))

Now /usr/bin/time -v reports 1.3 GB (again), which is correct. But top reports 6 x 1.3 GB, which is incorrect. With copy-on-write, it should only report 1.3 GB.

How can I reliably monitor the memory usage of a Python script using multiprocessing?

usual me
  • 8,338
  • 10
  • 52
  • 95

1 Answers1

1

It really depends on what is meant by "reliable". You may want to use pmap <pid> command to get statistics on memory usage of the processes (I guess you are interested in total field). You need to track all processes which were created during execution of your program (I guess ps --forest may help you here).

If you want to get details, then you may want to use /proc/[pid]/{smaps,status,maps} (man pages).

Also please remember about difference between RSS and VSZ.

Community
  • 1
  • 1