Using mutliprocessing, when are variables from the main process imported in the child processes?

Question

I was taught that to import a variable to a child process in a multiprocessing pool, you needed to use an initializer.

Strangely, I can call variables defined in the main loop in the child process, without using an initializer:

import multiprocessing
import numpy as np

def ChildFun(i):
    print(myValue)
    print(f'Processing the index {i}')

if __name__ == "__main__":
    myValue = 'This should not appear'
    myList = np.arange(5)
    with multiprocessing.Pool() as pool:
        pool.map(ChildFun,myList)

Normally, I would expect to only see

Processing the index 2
Processing the index 0
Processing the index 3
Processing the index 1
Processing the index 4

But I get

This should not appear
This should not appear
This should not appear
This should not appear
This should not appear
Processing the index 2
Processing the index 0
Processing the index 3
Processing the index 1
Processing the index 4

How come? Does multiprocessing import all the variables from the main process, even if they are protected by if __name__ == "__main__":? Or does it just search in the main process variables that it did not find in the child processes?

score 0 · Answer 1 · answered Jun 30 '20 at 16:48

It seems that it is because of how Unix handles forks, as explained here:Python multiprocessing--global variables in separate processes sharing id?

My guess is that using initilizer is cleaner (you know what you pass), and more explicit for sharing read-only variables. But mostly, it is probably the way to make your code work on other plateforms (typically on Windows), which don't have the same forking mechanism.

Using mutliprocessing, when are variables from the main process imported in the child processes?

1 Answers1