Multiprocessing on ms-windows works differently from UNIX-like systems.
UNIX-like systems have the fork
system call, which makes a copy of the current process. In modern systems with copy-on-write virtual memory management, this is not even a very expensive operation.
This means that global data in the parent process will be shared with the child process, until the child process writes to that page, in which case it will be copied.
The thing is that ms-windows doesn't have fork
. It has CreateProcess
instead. So on ms-windows, this happens:
The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.
So since your global data is referenced in your function it will be loaded. But every child process will load it separately.
What you could try is have your processes load the data using mmap
with ACCESS_READ
. I would expect that the ms-windows memory subsystem is smart enough to only load the data once in case the same file is loaded by multiple processes.