2

I am trying to execute the following line

gsutil -m rsync s3://input gs://output

in python. When running this line in the shell terminal it works fine. However, I am trying to run this in a python script by using the following line.

subprocess.Popen(["gsutil", "-m", "rsync", "s3://input", "gs://output"])

However it just hangs forever. It outputs the following:

Building synchronization state...
Starting synchronization...

The bash command successfully prints:

Building synchronization state...
Starting synchronization...
Copying s3://input/0000
[0/1 files][  1.0 MiB/ 5.1 MiB]   (number here)% Done

and the file shows in my gs bucket

fjson01
  • 71
  • 6

1 Answers1

5

I'm guessing this is because the last two lines are probably written to stderr instead of stdout. Can you try using the call to Popen as a context manager and then calling communicate() to read from the output streams?

proc = subprocess.Popen(["gsutil", "-m", "rsync", "s3://input", "gs://output"])
try:
    outs, errs = proc.communicate(timeout=15)
    # now you can do something with the text in outs and errs
except TimeoutExpired:
    proc.kill()
    outs, errs = proc.communicate()
rje
  • 6,388
  • 1
  • 21
  • 40
  • Ahh this worked with the caveat of subprocess.TimeoutExpired, but I'm curious why. When gsutil rsync is pulling a file and showing a progress bar, after 15 seconds it'll timeout and then the shell will return to me but the progress bar time will keep getting shoved to the screen. I don't understand how proc.kill() doesn't stop the gsutil rsync from continuing unless the child process isn't the thing running gsutil rsync on the two buckets? – fjson01 Oct 16 '18 at 16:10
  • 1
    You could also combine stderr and stdout, as mentioned here: https://stackoverflow.com/questions/6809590/merging-a-python-scripts-subprocess-stdout-and-stderr-while-keeping-them-disti. As for `proc.kill()`, you may want to give the subprocess a chance to do proper cleanup (SIGTERM via `proc.terminate()`) before laying the hammer down with SIGKILL. Since you're not specifying pipes to send output to, the subprocess is sending output to its parent's stdout/stderr - when you call `communicate()` after sending SIGTERM, it reads stdout/stderr output up until the child process is finally terminated – mhouglum Oct 16 '18 at 17:56