8

I have a streaming backup script which I'm running as follows:

./backup_script.sh | aws s3 cp - s3://bucket/path/to/backup

The aws command streams stdin to cloud storage in an atomic way. If the process is interrupted without an EOF, the upload is aborted.

I want the aws process to be killed if ./backup_script.sh exits with a non-zero exit code.

Any bash trick for doing this?

EDIT: You can test your solution with this script:

#!/usr/bin/env python
import signal
import sys
import functools

def signal_handler(signame, signum, frame):
    print "Got {}".format(signame)
    sys.exit(0)

signal.signal(signal.SIGTERM, functools.partial(signal_handler, 'TERM'))
signal.signal(signal.SIGINT, functools.partial(signal_handler, 'INT'))

for i in sys.stdin:
    pass

print "Got EOF"

Example:

$ grep --bla | ./sigoreof.py
grep: unrecognized option `--bla'
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
    [-e pattern] [-f file] [--binary-files=value] [--color=when]
    [--context[=num]] [--directories=action] [--label] [--line-buffered]
    [--null] [pattern] [file ...]
Got EOF

I want ./sigoreof.py to be terminated with a signal.

codeforester
  • 39,467
  • 16
  • 112
  • 140
omribahumi
  • 2,011
  • 1
  • 17
  • 19
  • 1
    I assume that your problem is that a failure of backup_script.sh just closes the pipe as far as the receiving process `aws` is concerned; aws cannot detect the error and assumes the backup went fine. – Peter - Reinstate Monica Sep 21 '15 at 15:08
  • Aside: `sigoreof` doesn't actually report signals for me, though it's definitely distinguishable from the EOF case. `./ftest: line 11: 53793 Terminated: 15 ./sigoreof < aws.fifo 3>&-` – Charles Duffy Sep 21 '15 at 16:28
  • As pointed out by @tourism: Likely dupe of http://stackoverflow.com/questions/6565694/left-side-failure-on-pipe-in-bash (but the answers there don't really address the question asked). – Charles Duffy Sep 21 '15 at 17:25

5 Answers5

4

Adopting/correcting a solution originally given by @Dummy00001:

mkfifo aws.fifo
exec 3<>aws.fifo # open the FIFO read/write *in the shell itself*
aws s3 cp - s3://bucket/path/to/backup <aws.fifo 3>&- & aws_pid=$!
rm aws.fifo # everyone who needs a handle already has one; can remove the directory entry

if ./backup_script.sh >&3 3>&-; then
    exec 3>&-       # success: close the FIFO and let AWS exit successfully
    wait "$aws_pid"
else
    kill "$aws_pid" # send a SIGTERM...
    wait "$aws_pid" # wait for the process to die...
    exec 3>&-       # only close the write end *after* the process is dead
fi

Important points:

  • The shell opens the FIFO r/w to avoid blocking (opening for write only would block for a reader; this could also be avoided by invoking the reader [that is, the s3 command] in the background prior to the exec opening the write side).
  • The write end of the FIFO is held by the script itself, so the read end never hits end-of-file until after the script intentionally closes it.
  • The aws command's handle on the write end of the FIFO is explicitly closed (3<&-), so it doesn't hold itself open (in which case the exec 3>&- done in the parent would not successfully allow it to finish reading and exit).
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • I retracted that comment; I'm a bit fuzzy on dealing with file descriptors like this, but I think you are right. – chepner Sep 21 '15 at 15:54
  • This is a little more complicated than my answer, but it definitely saves on bandwidth by killing the upload sooner. – chepner Sep 21 '15 at 15:56
  • @chepner, good call re: complexity -- there was indeed some room to simplify. – Charles Duffy Sep 21 '15 at 16:09
  • Oof. Had to clean up a bit on testing -- original would block indefinitely on the exec. @omribahumi, if you're inclined to give this one a spin, it Works For Me. – Charles Duffy Sep 21 '15 at 16:17
  • @Dummy00001, ...as demonstrated here, when I proposed having something else on the write end of the FIFO (in this case, the parent shell), I didn't mean that meant that the pipeline had to live in the foreground. You could background the whole thing but have a side-effecting component on write half and have the same effect. (That is: `{ thing_to_test; record_exit_status_and_block $?; } | read_end &` is what I was originally trying to suggest to you). – Charles Duffy Sep 21 '15 at 18:14
3

backup_script.sh should have a non-zero exit status if there is an error, so you script should look something like:

if ./backup_script.sh > output.txt; then
    aws s3 cp output.txt s3://bucket/path/to/backup
fi
rm -f output.txt

A pipe isn't really appropriate here.


If you really need to conserve disk space locally, you'll have to "reverse" the upload; either remove the uploaded file in the event of an error in backup_script.sh, or upload to a temporary location, then move that to the final path once you've determined that the backup has succeeded.

(For simplicity, I'm ignoring the fact that by letting aws exit on its own in the event of an error, you may be uploading more of the partial backup than you need to. See Charles Duffy's answer for a more bandwidth-efficient approach.)

After starting the backup process with

mkfifo data
./backup_script.sh > data & writer_pid=$!

use one of the following to upload the data.

# Upload and remove if there was an error
aws s3 cp - s3://bucket/path/to/backup < data &

if ! wait $writer_pid; then
    aws s3 rm s3://bucket/path/to/backup
fi

or

# Upload to a temporary file and move it into place
# once you know the backup succeeded.
aws s3 cp - s3://bucket/path/to/backup.tmp < data &

if wait $writer_pid; then
    aws s3 mv s3://bucket/path/to/backup.tmp s3://bucket/path/to/backup
else
    aws s3 rm s3://bucket/path/to/backup
fi
Community
  • 1
  • 1
chepner
  • 497,756
  • 71
  • 530
  • 681
  • I don't want to keep `output.txt` on disk for space reasons. Apparently this is possible with `coproc`. I'm looking into it right now. – omribahumi Sep 21 '15 at 15:17
  • I'm assuming you have far more disk space than memory, and `aws` must be buffering its input *somewhere*. – chepner Sep 21 '15 at 15:18
  • It uses chunked upload which uploads 5MB parts – omribahumi Sep 21 '15 at 15:19
  • So by "abort", you mean `aws` will delete the parts it has already uploaded? – chepner Sep 21 '15 at 15:20
  • Pipes are asynchronous; you can't guarantee that you can signal a problem to `aws` before it mistakenly concludes that it has successfully uploaded all the input it can read. – chepner Sep 21 '15 at 15:23
  • I want bash to kill it and not send an EOF to it, then the uploaded chunks will be discarded. – omribahumi Sep 21 '15 at 15:26
  • 2
    Writing processes don't send a specific EOF character; they simply close their end of the pipe. The reader doesn't know why the writer closed its end; it only know that it does. Once `aws` sees that the pipe was closed, there is *no* guaranteed that you can tell it there was an error before it commits the upload as successful. Its atomic behavior is to guard against errors in its own process, not against external errors generating the data it reads. – chepner Sep 21 '15 at 15:27
  • So I need a trick to make bash keep the write end of the pipe opened. Still looking into that `coproc`. – omribahumi Sep 21 '15 at 15:28
  • @omribahumi, I actually wouldn't call coproc the right tool for the job. Not saying you couldn't possibly use it, but I wouldn't; some small tweaks to Dummy's answer would do the trick, while retaining compatibility with older releases of bash. – Charles Duffy Sep 21 '15 at 15:30
  • The explicit-rename approach is definitely worth an upvote. – Charles Duffy Sep 21 '15 at 15:46
2

A short script which uses process substitution instead of named pipes would be:

#!/bin/bash

exec 4> >( ./second-process.sh )
./first-process.sh >&4  &
if ! wait $! ; then echo "error in first process" >&2; kill 0; wait; fi

It works much like with a fifo, basically using the fd as the information carrier for the IPC instead of a file name.

Two remarks: I wasn't sure whether it's necessary to close fd 4 ; I would assume that upon script exit the shell closes all open files.

And I couldn't figure out how to obtain the PID of the process in the process substitution (anybody? at least on my cygwin the usual $! didn't work.) Therefore I resorted to killing all processes in the group, which may not be desirable (but I'm not entirely sure about the semantics).

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
1

I think you need to spawn both processes from a third one and either use the named pipe approach from Lynch in the post mentioned by @tourism (further below in the answers); or keep piping directly but re-write backup_script.sh such that it stays alive in the error case, keeping stdout open. backup_script.sh would have to signal the error condition to the calling process (e.g. by sending a SIGUSR to the parent process ID), which in turn first kills the aws process (leading to an atomic abort) and only then backup_script.sh, unless it exited already because of the broken pipe.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
0

I had a similar situation: a shell script contained a pipeline that used one of its own functions and that function wanted to be able to effect termination. A simple contrived example that finds and displays a file:

#!/bin/sh
a() { find . -maxdepth 1 -name "$1" -print -quit | grep . || exit 101; }
a "$1" | cat
echo done

Here, the function a needs to be able to effect termination which it tries to do by calling exit. However, when invoked through a pipeline (line 3), it only terminates its own (subshell) process. In the example, the done message still appears.

One way to work around this is to detect when in a subshell and send a signal to the parent:

#!/bin/sh
die() { [[ $$ == $(exec sh -c 'echo $PPID') ]] && exit $1 || kill $$; }
a() { find . -maxdepth 1 -name "$1" -print -quit | grep . || die 101; }
a "$1" | cat
echo done

When in a subshell the $$ is the pid of the parent and the construct $(exec sh -c 'echo $PPID') is a shell-agnostic way to obtain the pid of the subprocess. If using bash then this can be replaced by $BASHPID.

If the subprocess pid and $$ differ then it sends a SIGTERM signal to the parent (kill $$) instead of calling exit.

The given exit status (101) isn't propagated by kill so the script exits with a status of 143 (which is 128+15 where 15 is the id of SIGTERM).

starfry
  • 9,273
  • 7
  • 66
  • 96