3

I'm using multiprocessing in a larger code base where some of the import statements have side effects. How can I run a function in a background process without having it inherit global imports?

# helper.py:

print('This message should only print once!')
# main.py:

import multiprocessing as mp
import helper  # This prints the message.

def worker():
  pass  # Unfortunately this also prints the message again.

if __name__ == '__main__':
  mp.set_start_method('spawn')
  process = mp.Process(target=worker)
  process.start()
  process.join()

Background: Importing TensorFlow initializes CUDA which reserves some amount of GPU memory. As a result, spawing too many processes leads to a CUDA OOM error, even though the processes don't use TensorFlow.

Similar question without an answer:

danijar
  • 32,406
  • 45
  • 166
  • 297
  • you need gate the imports behind an `if` statement or use a platform that supports `fork` as `start_method` – vinzenz Dec 07 '21 at 16:52
  • 1
    i.e. you can only import the problematic modules if `multiprocessing.parent_process()` returns `None` https://docs.python.org/3/library/multiprocessing.html#multiprocessing.parent_process – vinzenz Dec 07 '21 at 16:55
  • @vinzBad Thanks. I explicitly set `spawn` to resolve issues with some imports that are not fork safe because they launch threads, so switching back to `fork` won't work, unfortunately. Would I gate the imports behind `if __name__ == '__main__'`? Is there a resources that explains exactly what the `multiprocessing` module does when starting an `mp.Process`? It's a bit too much magic for my taste :) – danijar Dec 07 '21 at 16:56
  • 2
    If you define worker in a separate file the imports from the parent will still exist in `sys.modules` although they are not defined. – kpie Dec 07 '21 at 16:56
  • @kpie Interesting, so `process.start()` only executes the current file? Or what portion of the code base does it execute? Shouldn't the process start with a clean `sys.modules` when using `mp.set_start_method('spawn')`? – danijar Dec 07 '21 at 16:59
  • @kpie I tried to verify what you said but it seems that modules imported from other files (as is the case in my application) are still loaded in the subprocess. I've updated the question to this more detailed code example. – danijar Dec 07 '21 at 17:08
  • 2
    @danijar As vinzBad suggested, you can put the imports inside the `if __name__ == '__main__':` "guard". When starting the process, a new Python interpreter is created and the relevant module (the main module) is imported, then the `target` function is called (see "Safe importing of main module" right above the [Examples section](https://docs.python.org/3/library/multiprocessing.html#examples)). Hence the `if` guard will prevent the imports when the module itself is imported. – a_guest Dec 07 '21 at 17:19

2 Answers2

2

Is there a resources that explains exactly what the multiprocessing module does when starting an mp.Process?

Super quick version (using the spawn context not fork)

Some stuff (a pair of pipes for communication, cleanup callbacks, etc) is prepared then a new process is created with fork()exec(). On windows it's CreateProcessW(). The new python interpreter is called with a startup script spawn_main() and passed the communication pipe file descriptors via a crafted command string and the -c switch. The startup script cleans up the environment a little bit, then unpickles the Process object from its communication pipe. Finally it calls the run method of the process object.

So what about importing of modules?

Pickle semantics handle some of it, but __main__ and sys.modules need some tlc, which is handled here (during the "cleans up the environment" bit).

Aaron
  • 10,133
  • 1
  • 24
  • 40
  • That's super helpful, thanks! In wondering why I couldn't find much on this in the package documentation, despite many of the steps being non-obvious. – danijar Dec 08 '21 at 15:51
  • 1
    @danijar It's all implementation details, which is not supposed to matter to the user.. In theory as long as it works exactly as described in the docs it shouldn't matter how it works under the hood. In practice, we all know there are always edge cases (and bugs) in implementation. – Aaron Dec 08 '21 at 22:42
0
# helper.py:

print('This message should only print once!')
# main.py:

import multiprocessing as mp

def worker():
  pass

def main():

  # Importing the module only locally so that the background
  # worker won't import it again.
  import helper

  mp.set_start_method('spawn')
  process = mp.Process(target=worker)
  process.start()
  process.join()

if __name__ == '__main__':
  main()
danijar
  • 32,406
  • 45
  • 166
  • 297