1

I am trying to create a child process in python 3.8.0 using multiprocessing module without inheriting the parent's memory. I am using spawn start method mp.set_start_method('spawn') for this. But the memory usage of the child process is almost same as the parent process. Code snippets below

I am using code shared here for testing How can I restrict the scope of a multiprocessing process?

memtest.py

import multiprocessing  as mp
import numpy as np

def foo(x):
    import time
    time.sleep(60)

if __name__ == "__main__":
    mp.set_start_method('spawn')

    dont_inherit = np.ones((500, 100))
    for x in range(3):
        mp.Process(target=foo, args=(x,)).start()

run using python3 memtest.py

memory usage from top

 449m  28m  14m S  0.0  0.2   0:00.44 python3 memtest.py
 34904  10m 5816 S  0.0  0.1   0:00.03 /srv/env/bin/python3 -c from multiprocessing.resource_tracker import main;main(5)
 252m  26m  13m S  0.0  0.2   0:00.26 /srv/env/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=20) --multiprocessing-fork
 252m  27m  13m S  0.0  0.2   0:00.21 /srv/env/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=22) --multiprocessing-fork
 252m  26m  13m S  0.0  0.2   0:00.23 /srv/env/bin/python3 -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=24) --multiprocessing-fork

I am using virtualenv with python3.8.0 on ubuntu18.04

$ python3 --version
Python 3.8.0

What is wrong in this approach of creating a child process? I need to create a lot of child processes that need to be light weight, I initially figured using mp's spawn approach would do this but it doesn't seem to be working.

s3kv
  • 13
  • 4

2 Answers2

0

Short answer: spawn also copy global variable so you should either:

  1. create processes first, and then dont_inherit. I think this is more elegant but probably not always possible; or
  2. in each subprocess del dont_inherit. After subprocess is created you have only one copy of dont_inherit in memory (at least on Linux, where copy-on-write works well), so removal in subprocess "reference" to dont_inherit is rather cheap and fast.

Here is some longer story: I am not sure what exactly ps measure so I think it is better to use total memory usage (e.g. using htop)

import multiprocessing as mp
ctx = mp.get_context('spawn') #or fork, both work the same
q = ctx.Queue()

def proc(q):
    while True:
        msg = q.get()
        print("Q", msg)

longlist = [ x for x in range(60_000_000) ]
#additional 2.3GB in RAM
p = ctx.Process(target=proc, args=(q,))
p.start()
#no change in memory usage 
for i in range( len(longlist) ):
    longlist[i] = longlist[i]+1 #memory usage growing 
# when for is ended you have additional 2.3GB in RAM (now ~4.6GB is used)
# because you have original longlist in subprocess 
# and modified longlist in main processs

below the same but with del global variable in subprocess

import multiprocessing as mp
ctx = mp.get_context('spawn') #or fork, both work the same
q = ctx.Queue()

def proc(q):
    global longlist
    del longlist
    while True:
        msg = q.get()
        print("Q", msg)

longlist = [ x for x in range(60_000_000) ]
#additional 2.3GB in RAM
p = ctx.Process(target=proc, args=(q,))
p.start()
#no change in memory usage 
for i in range( len(longlist) ):
    longlist[i] = longlist[i]+1 #no change in memory usage
# in this point total memory usage is still ~2.3GB
rmrmg
  • 51
  • 4
0

@rmrmg's answer is misleading.

Spawn will copy over global variables, yes, but it won't copy over memory that's protected by the __name__=='__main__' scope. Spawn essentially imports your current module, and when it does this import, the __name__=='__main__' block does not activate. This is the point of __name__=='__main__' (to protect execution code so that it is not run at import).

Now, in regards to why your memory usage is similar across your processes, that's because your dont_inherit is made up of 500*100 ints, which amounts to 4*500*100 = 200000 bytes = 200 kilobytes. Your subprocesses indeed don't have your dont_inherit object, the memory saved is just so small you can't even detect it from running top.

In the future, you should try to access these kinds of objects directly so that you can confirm whether they're present or not. E.g.

import multiprocessing  as mp
import numpy as np

def foo(x):
    global dont_inherit
    print(dont_inherit)

if __name__ == "__main__":
    mp.set_start_method('spawn')

    dont_inherit = np.ones((500, 100))
    for x in range(3):
        mp.Process(target=foo, args=(x,)).start()

If you run this, you'll see that your print statements will throw an error because nothing is there.

You can also make your dont_inherit variable larger by a couple orders so you can actually see the memory usage.

Jason Kang
  • 94
  • 7