TL;DR
According to Valgrind's memcheck tool, if I allocate a large local variable in a function and launch that function using multiprocessing.Pool().apply_async()
, the heap size for both the subprocess and the main process increases. Why does main's heap size increase?
Background
I am working with a multiprocessing pool of workers, each of which will be dealing with a large amount of data from an input file. I want to see how my memory footprint scales based on the size of the input file. To do this, I ran my script under Valgrind using memcheck with the technique described in this SO answer. (I have since learned that Valgrind's Massif tool is better suited for this, so I will use it instead going forward.)
There was something that seemed odd in the memcheck output that I would like help understanding.
I am using CPython 2.7.6 on Red Hat Linux, and running memcheck like this:
valgrind --tool=memcheck --suppressions=./valgrind-python.supp python test.py
Code and Output
import multiprocessing
def mem_user():
tmp = 'a'*1
return
pool = multiprocessing.Pool(processes=1)
pool.apply_async(mem_user)
pool.close()
pool.join()
Heap Summaries (one per process):
total heap usage: 45,193 allocs, 32,392 frees, 7,221,910 bytes allocated
total heap usage: 44,832 allocs, 22,006 frees, 7,181,635 bytes allocated
If I change the tmp = 'a'*1
line to tmp = 'a'*10000000
I get these summaries:
total heap usage: 44,835 allocs, 22,009 frees, 27,181,763 bytes allocated
total heap usage: 45,195 allocs, 32,394 frees, 17,221,998 bytes allocated
The Question
Why do the heap sizes of both processes increase? I understand that space for objects is allocated on the heap, so the larger heap certainly makes sense for one of the processes. But I expected a subprocess to be given its own heap, stack, and instance of the interpreter, so I don't understand why a local variable allocated in the subprocess increased main's heap size as well. If they share the same heap, then does CPython implement its own version of fork() that doesn't allocate unique heap space to the subprocess?