0

I gather that using multiprocessing causes new instances of the program to be initialized. e.g. the following code would cause 7 instances to be initialized:

pool = mp.Pool(processes=7) 
pool.apply_async(process, args=(properties,)) for properties in properties_list]

My question though is what is actually getting loaded in the new instance? For example, if i have file_a.py, which calls a function from file_b.py, and the function in file_b.py uses multiprocessing, is it just file_b.py that gets re-loaded, or file_a.py and file_b.py?

kyrenia
  • 5,431
  • 9
  • 63
  • 93

1 Answers1

2

I believe all of your imports will be imported.

The details vary depending on your start method.

  • fork method (the default on Unix) - this actually forks your program. In this case, all memory and resources of the parent are cloned to the child. Whatever was loaded in the parent will be loaded in the child. (And all resources, like file descriptors, will be shared between the two processes.)

  • spawn or forkserver method (spawn is the default on Windows) - both of these start at least one new instance of your python interpreter, and pickles whatever arguments and other resources are needed for the run method. As far as I'm aware, all of file_a.py is parsed, including the import of file_b.py.

In both of the latter cases, the documentation says that "no unnecessary resources are inherited", but this isn't referring to loading imported code; it's talking about operating system resources like shared memory access or file descriptors.

Scott Mermelstein
  • 15,174
  • 4
  • 48
  • 76
  • 1
    very well explained, I also [found this](http://stackoverflow.com/questions/659865/python-multiprocessing-sharing-a-large-read-only-object-between-processes) which explains the memory model used between `multiprocessing` processes – hansaplast Jan 13 '17 at 21:22