I understand that multiprocessing starts differently in linux vs Windows where linux fork() and window spawn()
spawn
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver. [Available on Unix and Windows. The default on Windows and macOS.]
fork
The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. [Available on Unix only. The default on Unix.]
I am using Windows and I have a script where I am importing some personal module.
main.py
that imports test.py
:
import test
import multiprocessing as mp
def f(x,y):
z = test.add(x,y)
print(z)
if __name__ == '__main__':
pool = mp.Pool(2)
pool.map(f, [(x,x+1,) for x in [1,2]])
Ideally, this will spawn 2 processes. What I want to do is that make sure each of this process does not have to re-import test.py
again since import test
is a super heavy loading process.
I believe if I can find a way to switch Windows to use fork instead of spawn, it might resolve the issue. But I am not 100% sure.