I am using the open4
gem to wrap system calls to a potentially long-running third-party command line tool. The tool may sometimes fail, keeping two processes busy, and partially blocking a pipeline, as the parent process is part of a pool of worker scripts (serving a Beanstalk queue). From outside of the system, I can identify a stuck worker script and its process id programatically, based on the data model of what is being processed. Inside the Open4.open4
block, I can identify the child process id.
I'd like to set up the Open4
block so that when I send a SIGTERM to the parent worker process, it forwards on the SIGTERM to the child. In addition, if the child process has still failed to exit after a short wait, I want to send a SIGKILL to the child process. In both cases, I'd then like the parent process to respond as normal to the SIGTERM it was sent.
This is all being done so I can expose a "stop" button in a customer services app, so non-technical team members have a tool to manage their way out of a situation with a blocked queue.
I have found some related questions in SO - e.g. How to make child process die after parent exits? - but the answers are not really usable for me from Ruby application code.
Here is a current implementation in Ruby that I have tested on my Mac:
Test stand-in for "bad" process that won't always respond to SIGTERM:
# Writing to a log file shows whether or not a detached process continues
# once the parent has closed IO to it.
$f = open( 'log.txt', 'w' );
def say m
begin
$f.puts m
$f.flush
$stderr.puts m
rescue Exception => e
# When the parent process closes, we get
# #<Errno::EPIPE: Broken pipe - <STDERR>> in this
# test, but with a stuck child process, this is not
# guaranteed to happen or cause the child to exit.
$f.puts e.inspect
$f.flush
end
end
Signal.trap( "TERM" ) { say "Received and ignored TERM" }
# Messages get logged, and sleep allows test of manual interrupts
say "Hello"
sleep 3
say "Foo Bar Baz"
sleep 3
say "Doo Be Doo"
sleep 3
say "Goodbye"
$f.close
Test Open4 block (part of a "worker" test script):
Open4.open4(@command) do | pid, stdin, stdout, stderr |
begin
stderr.each { |l|
puts "[#{pid}] STDERR: #{l}" if l
}
rescue SignalException => e
puts "[#{$$}] Received signal (#{e.signo} #{e.signm}) in Open4 block"
# Forward a SIGTERM to child, upgrade to SIGKILL if it doesn't work
if e.signo == 15
begin
puts "[#{$$}] Sending TERM to child process"
Process.kill( 'TERM', pid )
timeout(3.0) { Process.waitpid( pid ) }
rescue Timeout::Error
puts "[#{$$}] Sending KILL to child process"
Process.kill( 'KILL', pid )
end
end
raise e
end
end
Typical output if I start this up, and run e.g. kill -15 16854
:
[16855] STDERR: Hello
[16854] Received signal (15 SIGTERM) in Open4 block
[16854] Sending TERM to child process
[16854] Sending KILL to child process
Contents of log file for same test:
Hello
Received and ignored TERM
Foo Bar Baz
The code is IMO a bit unwieldy, although it appears to work as I want. My questions:
- Is the above attempt ok, or fatally flawed in the use case I need it for?
- Have I missed a cleaner way of doing the same thing using existing
Open4
and core Ruby methods?