Im writing a simple job scheduler and I need to maintain some basic level of communication from the scheduler ( parent ) to spawned processes ( children ). Basically I want to be able to send signals from scheduler -> child and also inform the the scheduler of a child's exit status.
Generally this is all fine and dandy using regular subprocess Popen with signaling / waitpid. My issues is that I want to be able to restart the scheduler without impacting running jobs, and re-establish comm once the scheduler restarts.
Ive gone through a few ideas, nothing feels "right".
Right now I have a simple wrapper script that maintains communication with the scheduler using named pipes, and the wrapper will actually run the scheduled command. This seems to work well in most cases but I have seen it subject to some races, mainly due to how namedpipes need the connection to be alive in order to read/write ( no buffering ).
Im considering trying linux message queues instead, but not sure if that's the right path. Another option is setup a server socket accept on the scheduler and have sub-processes connect to the scheduler when they start, and re-establish the connection if it breaks. This might be overkill, but maybe the most robust. Its a shame I have to go through all this trouble since I cant just "re-attached" the child process on restart ( I realize there are reasons for this ).
Any other thoughts? This seems like a common problem any scheduler would hit, am I missing some obvious solution here?