1

If I need to share a multiprocessing.Queue or a multiprocessing.Manager (or any of the other synchronization primitives), is there any difference in doing it by defining them at the global (module) level, versus passing them as an argument to the function executed in a different process?

For example, here are three possible ways I can imagine a queue could be shared:

# works fine on both Windows and Linux
from multiprocessing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

def main():
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

if __name__ == '__main__':
    main()

vs.

# works fine on Linux, hangs on Windows
from multiprocessing import Process, Queue
q = Queue()

def f():
    q.put([42, None, 'hello'])

def main():
    p = Process(target=f)
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

if __name__ == '__main__':
    main()

vs.

# works fine on Linux, NameError on Windows
from multiprocessing import Process, Queue

def f():
    q.put([42, None, 'hello'])

def main():
    p = Process(target=f)
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

if __name__ == '__main__':
    q = Queue()
    main()

Which the correct approach? I'm guessing from my experimentation that it's only the first one, but wanted to confirm it's officially the case (and not only for Queue but for Manager and other similar objects).

max
  • 49,282
  • 56
  • 208
  • 355
  • The second should have global q at the start of f()? The first is best in my opinion merely because the objects have the correct scope, but it is just a matter of style *IF* they truly are singletons, and nobody ever changes the code. Another reason to pass them as arguments is to prove you have them, and are therefore calling the function properly, although that works better in c++ and other strongly typed languages, where the compiler can catch bookkeeping errors for you if you use good style. – Kenny Ostrom Mar 11 '17 at 15:24
  • No need for `global q`, when you only modify a global object through a method call, rather than reassign the variable to refer to a new object. As I said, it seems 2 out of 3 don't work on Windows, so I guess it's not a question of style. I just couldn't find a clear explanation in the docs, but it seems passing a parameter is the only reliable technique. – max Mar 11 '17 at 19:15
  • This may help: http://stackoverflow.com/questions/37244168/multiprocessing-queue-get-hangs – Murray Lee Mar 11 '17 at 19:46

2 Answers2

1

As mentioned in the programming guidelines

Explicitly pass resources to child processes

On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process.

The issue is the way the spawn/forkserver (Windows only supports spawn) works under the hood. Instead of cloning the parent process with its memory and files desciptors, it creates a new process from the ground. It then loads a new Python interpreter passing the modules to import and launches it. This obviously means your global variable will be a brand new Queue instead of the parent's one.

Another implication is that the objects you want to pass to the new process must be pickleable as they will be passed through a pipe.

Community
  • 1
  • 1
noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • Hmm I am really not sure what those guidelines are trying to say. Can you take a look at the message I posted [here](http://bugs.python.org/msg289505)? – max Mar 12 '17 at 19:25
  • Sorry I quoted the wrong guideline. My bad. I updated the answer. My apologies again. – noxdafox Mar 12 '17 at 19:53
0

Just summarizing the answer from Davin Potts:

The only portable solution is to share Queue() and Manager().* objects by passing them as arguments - never as global variables. The reason is that on Windows all the global variables will be re-created (rather than copied) by literally running module the code from the beginning (very little information is actually copied from the parent process to the child process); so a brand new Queue() would be created and of course (without some undesirable and confusing magic) it can't possibly be connected to the Queue() in the parent process.

My understanding is that there is no disadvantage to passing Queue(), etc. as parameters; I can't find any reason why anyone would want to use a non-portable solution with global variables, but of course I may be wrong.

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355