I use ZMQ
in python to distribute calculations between a master application and worker
sub-processes, via a PUSH-PULL
.
At times, the master might crash and the sub-processes remain hanging, listening to their respective ports.
I tried to use atexit
to close the workers in the event that the master crashes, as suggested in this SO question. However atexit
does not capture the case when I forcefully close the master.
Is there a way for the PULL
-side worker sub-processes to notice that the PUSH
-side master is closed via the zmq sock
(as possibly hinted here)?
Practical Solution (edit)
A practical solution I implemented is to have the master PUSH
a message to close all pending workers when it re-starts:
Before spawning its own helpers, the new instance of the master broadcasts an exit
message to all sockets.
Upon receiving the exit command, the hanging sub-processes (launched by the previous instance of the master) do a sys.exit()
.