2

I am using the open4 gem to wrap system calls to a potentially long-running third-party command line tool. The tool may sometimes fail, keeping two processes busy, and partially blocking a pipeline, as the parent process is part of a pool of worker scripts (serving a Beanstalk queue). From outside of the system, I can identify a stuck worker script and its process id programatically, based on the data model of what is being processed. Inside the Open4.open4 block, I can identify the child process id.

I'd like to set up the Open4 block so that when I send a SIGTERM to the parent worker process, it forwards on the SIGTERM to the child. In addition, if the child process has still failed to exit after a short wait, I want to send a SIGKILL to the child process. In both cases, I'd then like the parent process to respond as normal to the SIGTERM it was sent.

This is all being done so I can expose a "stop" button in a customer services app, so non-technical team members have a tool to manage their way out of a situation with a blocked queue.

I have found some related questions in SO - e.g. How to make child process die after parent exits? - but the answers are not really usable for me from Ruby application code.

Here is a current implementation in Ruby that I have tested on my Mac:

Test stand-in for "bad" process that won't always respond to SIGTERM:

# Writing to a log file shows whether or not a detached process continues
# once the parent has closed IO to it.
$f = open( 'log.txt', 'w' );

def say m
  begin
    $f.puts m
    $f.flush
    $stderr.puts m
  rescue Exception => e
    # When the parent process closes, we get
    # #<Errno::EPIPE: Broken pipe - <STDERR>> in this
    # test, but with a stuck child process, this is not 
    # guaranteed to happen or cause the child to exit.
    $f.puts e.inspect
    $f.flush
  end
end

Signal.trap( "TERM" ) { say "Received and ignored TERM" }

# Messages get logged, and sleep allows test of manual interrupts
say "Hello"
sleep 3
say "Foo Bar Baz"
sleep 3
say "Doo Be Doo"
sleep 3
say "Goodbye" 
$f.close

Test Open4 block (part of a "worker" test script):

Open4.open4(@command) do | pid, stdin, stdout, stderr |
  begin
    stderr.each { |l|
      puts "[#{pid}] STDERR: #{l}" if l
    }
  rescue SignalException => e
    puts "[#{$$}] Received signal (#{e.signo} #{e.signm}) in Open4 block"

    # Forward a SIGTERM to child, upgrade to SIGKILL if it doesn't work
    if e.signo == 15
      begin
        puts "[#{$$}] Sending TERM to child process"
        Process.kill( 'TERM', pid )
        timeout(3.0) { Process.waitpid( pid ) }
      rescue Timeout::Error
        puts "[#{$$}] Sending KILL to child process"
        Process.kill( 'KILL', pid )
      end
    end

    raise e
  end
end

Typical output if I start this up, and run e.g. kill -15 16854:

[16855] STDERR: Hello
[16854] Received signal (15 SIGTERM) in Open4 block
[16854] Sending TERM to child process
[16854] Sending KILL to child process

Contents of log file for same test:

Hello
Received and ignored TERM
Foo Bar Baz

The code is IMO a bit unwieldy, although it appears to work as I want. My questions:

  1. Is the above attempt ok, or fatally flawed in the use case I need it for?
  2. Have I missed a cleaner way of doing the same thing using existing Open4 and core Ruby methods?
Community
  • 1
  • 1
Neil Slater
  • 26,512
  • 6
  • 76
  • 94

0 Answers0