3

I'm trying to prevent uploads to S3 in case any previous pipelined command will fail, unfortunately none of these two methods works as expected:

Shell pipeline

for database in sorted(databases):
    cmd = "bash -o pipefail -o errexit -c 'mysqldump -B {database} | gpg -e -r {GPGRCPT} | gof3r put -b {S3_BUCKET} -k {database}.sql.e'".format(database = database, GPGRCPT = GPGRCPT, S3_BUCKET = S3_BUCKET)
    try:
        subprocess.check_call(cmd, shell = True, executable="/bin/bash")
    except subprocess.CalledProcessError as e:
        print e

Popen with PIPEs

for database in sorted(databases):
try:
    cmd_mysqldump = "mysqldump {database}".format(database = database)
    p_mysqldump = subprocess.Popen(shlex.split(cmd_mysqldump), stdout=subprocess.PIPE)

    cmd_gpg = "gpg -a -e -r {GPGRCPT}".format(GPGRCPT = GPGRCPT)
    p_gpg = subprocess.Popen(shlex.split(cmd_gpg), stdin=p_mysqldump.stdout, stdout=subprocess.PIPE)
    p_mysqldump.stdout.close()

    cmd_gof3r = "gof3r put -b {S3_BUCKET} -k {database}.sql.e".format(S3_BUCKET = S3_BUCKET, database = database)
    p_gof3r = subprocess.Popen(shlex.split(cmd_gof3r), stdin=p_gpg.stdout, stderr=open("/dev/null"))
    p_gpg.stdout.close()
except subprocess.CalledProcessError as e:
            print e

I tried something like this with no luck:

....
if p_gpg.returncode == 0:
    cmd_gof3r = "gof3r put -b {S3_BUCKET} -k {database}.sql.e".format(S3_BUCKET = S3_BUCKET, database = database)
    p_gof3r = subprocess.Popen(shlex.split(cmd_gof3r), stdin=p_gpg.stdout, stderr=open("/dev/null"))
    p_gpg.stdout.close()
...

Basically gof3r is streaming data to S3 even if there are errors, for instance when I intentionally change mysqldump -> mysqldumpp to generate an error.

HTF
  • 6,632
  • 6
  • 30
  • 49
  • 1
    What do you want to do if `mysqldump` generates (say) 20 GB of data and then errors out? Should `gof3r` travel back in time and stop itself from uploading anything? – Kevin Oct 07 '15 at 13:27
  • A good point, `stdout.close()` should help with this and that's why I also tried to use `errexit` with shell pipeline but it's not working in this case. – HTF Oct 07 '15 at 13:36
  • It sounds as if `gof3r` is ignoring `SIGHUP` and/or `SIGPIPE`. Perhaps you should investigate its documentation to see why it does that. – Kevin Oct 07 '15 at 13:37
  • @Kevin: I don't know whether it is related to OP's case in anyway but [Python 2 does not reset SIGPIPE for the child process, you have to do it manually](http://stackoverflow.com/a/22083141/4279). Though you would get write error instead. – jfs Oct 07 '15 at 15:45
  • Do you see OSError if you "intentionally change mysqldump -> mysqldumpp"? Are you sure you are running the correct source file (your code does not catch OSError and therefore there is no way `gof3r` would even start unless there is `mysqldumpp` executable in your environment)? – jfs Oct 07 '15 at 15:50

2 Answers2

3

I had the exact same question, and I managed it with:

cmd = "cat file | tr -d '\\n'"

subprocess.check_call( [ '/bin/bash' , '-o' , 'pipefail' , '-c' , cmd ] )

Thinking back, and searching in my code, I used another method too:

subprocess.check_call( "ssh -c 'make toto 2>&1 | tee log.txt ; exit ${PIPESTATUS[0]}'", shell=True )
Michael
  • 141
  • 2
  • 5
1

All commands in a pipeline run concurrently e.g.:

$ nonexistent | echo it is run

the echo is always run even if nonexistent command does not exist.

  • pipefail affects the exit status of the pipeline as a whole -- it does not make gof3r exit any sooner
  • errexit has no effect because there is a single pipeline here.

If you meant that you don't want to start the next pipeline if the one from the previous iteration fails then put break after print e in the exception handler.

p_gpg.returncode is None while gpg is running. If you don't want gof3r to run if gpg fails then you have to save gpg's output somewhere else first e.g., in a file:

filename = 'gpg.out'
for database in sorted(databases):
    pipeline_no_gof3r = ("bash -o pipefail -c 'mysqldump -B {database} | "
                         "gpg -e -r {GPGRCPT}'").format(**vars())
    with open(filename, 'wb', 0) as file:
        if subprocess.call(shlex.split(pipeline_no_gof3r), stdout=file):
            break # don't upload to S3, don't run the next database pipeline
    # upload the file on success
    gof3r_cmd = 'gof3r put -b {S3_BUCKET} -k {database}.sql.e'.format(**vars())
    with open(filename, 'rb', 0) as file:
        if subprocess.call(shlex.split(gof3r_cmd), stdin=file):
            break # don't run the next database pipeline
jfs
  • 399,953
  • 195
  • 994
  • 1,670