3

I have a program that, when it receives a SIGUSR1, writes some output and quits. I'm trying to get sbatch to notify this program before timing out.

I enqueue the program using:

sbatch -t 06:00:00 --signal=USR1 ... --wrap my_program

but my_program never receives the signal. I've tried sending signals while the program is running, with: scancel -s USR1 <JOBID>, but without any success. I also tried scancel --full, but it kills the wrapper and my_program is not notified.

One option is to write a bash file that wraps my_program and traps the signal, forwarding it to my_program (similar to this example), but I don't need this cumbersome bash file for anything else. Also, sbatch --signal documentation very clearly says that, when you want to notify the enveloping bash file, you need to specify signal=B:, so I believe that the bash wrapper is not really necessary.

So, is there a way to send a SIGUSR1 signal to a program enqueued using sbatch --wrap?

Pika Supports Ukraine
  • 3,612
  • 10
  • 26
  • 42
  • 1
    Is `my_program` launching job steps? Why aren't you using `signal=B:USR1`? `--wrap` is a short hand for creating a one-time bash script, so there is still a shell process. – Telgar Mar 16 '19 at 10:16

1 Answers1

2

Your command is sending the USR1 to the shell created by the --wrap. However, if you want the signal to be caught and processed, you're going to need to write the shell functions to handle the signal and that's probably too much for a --wrap command.

These folks are doing it but you can't see into their setup.sh script to see what they are defining. https://docs.nersc.gov/jobs/examples/#annotated-example-automated-variable-time-jobs

Note they use "." to run the code in setup.sh in the same process instead of spawing a sub-shell. You need that.

These folks describe a nice method of creating the functions you need: Is it possible to detect *which* trap signal in bash?

The only thing they don't show there is the function that would actually take action on receiving the signal. Here's what I wrote that does it - put this in a file that can be included from any user's sbatch submit script and show them how to use it and the --signal option:

trap_with_arg() {
    func="$1" ; shift
    for sig ; do
        echo "setting trap for $sig"
        trap "$func $sig" "$sig"
    done
}

func_trap () {
    echo "called with sig $1"
    case $1 in
        USR1)
            echo "caught SIGUSR1, making ABORT file"
            date
            cd $WORKDIR
            touch ABORT
            ls -l ABORT
        ;;
        *) echo "something else" ;;
    esac
}

trap_with_arg func_trap USR1 USR2
Mike Diehn
  • 192
  • 7