2

Consider the following example:

from multiprocessing import Queue, Pool

def work(*args):
    print('work')
    return 0

if __name__ == '__main__':
    queue = Queue()
    pool = Pool(1)
    result = pool.apply_async(work, args=(queue,))
    print(result.get())

This raises the following RuntimeError:

Traceback (most recent call last):
  File "/tmp/test.py", line 11, in <module>
    print(result.get())
  [...]
RuntimeError: Queue objects should only be shared between processes through inheritance

But interestingly the exception is only raised when I try to get the result, not when the "sharing" happens. Commenting the corresponding line silences the error while I actually did share the queue (and work is never executed!).

So here goes my question: Why is this exception only raised when the result is requested, and not when the apply_async method is invoked even though the error seems to be recognized because the target work function is never called?

It looks like the exception occurs in a different process and can only be made available to the main process when inter-process communication is performed in form of requesting the result. Then, however, I'd like to know why such checks are not performed before dispatching to the other process.
(If I used the queue in both work and the main process for communication then this would (silently) introduce a deadlock.)


Python version is 3.5.2.


I have read the following questions:

a_guest
  • 34,165
  • 12
  • 64
  • 118

1 Answers1

0

This behavior results from the design of the multiprocessing.Pool.

Internally, when you call apply_async, you put your job in the Pool call queue and then get back a AsyncResult object, which allow you to retrieve your computation result using get. Another thread is then in charge of pickling your work. In this thread, the RuntimeError happens but you already returned from the call to async_apply. Thus, it sets the results in AsyncResult as the exception, which is raised when you call get.

This behavior using some kind of future results is better understood when you try using concurrent.futures, which have explicit future objects and, IMO, a better design to handle failures, has you can query the future object for failure without calling the get function.

Thomas Moreau
  • 4,377
  • 1
  • 20
  • 32
  • Thank you for your answer! I have a question though. If `Queue` objects cannot be shared among processes via arguments why are we allowed to do exactly this with a `multiprocessing.Process` object (see [this documentation example -> Queues](https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes))? This doesn't raise a `RuntimeError` and works fine, so where is the difference? The arguments must be transferred to the other process as well, as it is the case for the `Pool`. What about pickling here? – a_guest Jul 19 '17 at 08:13
  • In the `Pool`, the process are spawn when you instanciate it. Thus, when you call `apply_async`, the `Process` are already started. You can pass a `Queue` in the `Pool` using the arguments `initializer` and `initargs` to declare a global `Queue` in the child processes I think. – Thomas Moreau Jul 19 '17 at 08:17
  • You can also declare the `Queue` as a global attribute of your module, to avoid messing with `initializer`. `pickle` should be able to handle this case I think. – Thomas Moreau Jul 19 '17 at 16:50