0

I am currently running bash commands in python script using os.system as the following:

for bucket in bucket_lst:
    start = time.time()
    command = "gsutil rsync -r /home/imagenet/tf_records " + bucket
    os.system(command)
    end = time.time() - start
    time_lst.append(end)

What I'm doing here is to transfer the data from a Google Compute Engine to Google Cloud Storage in diverse regions, which the regions are stored in "bucket_lst," and measure the time taken to finish the transfer to each region.

Each transfer to a region takes about an hour to two hours, and there are about 30 regions, so I need to run this process in the background with nohup as the ssh connection to the GCE gets disconnected often.

Currently, I tried the command "nohup python3 gce_to_gcs_throuhgput.py", but it seems like it ends the process after running the very first iteration of the command executed by the for-loop. Why is this happening and how can I fix things so the nohup command runs until it transfers the data to every regions?

DO Young Kim
  • 23
  • 2
  • 8
  • 1
    Did you tryed subprocess? https://docs.python.org/3/library/subprocess.html – Gonzalo Odiard Mar 06 '22 at 00:07
  • 1
    The normal way to run your command with `nohup` (for me anyway) would be: `nohup python3 gce_to_gcs_throughput.py – pjh Mar 06 '22 at 01:06
  • 1
    What happens if you run the command without `nohup`? Maybe the problem has nothing to do with using `nohup`. – pjh Mar 06 '22 at 01:08
  • @pjh The code runs properly processing the data transfer to each regions sequentially until the disconnection happens. So this behaviour of stopping after processing only one data transfer to single region started to happen after adding nohup – DO Young Kim Mar 06 '22 at 01:26
  • @GonzaloOdiard No I haven't. I probably should – DO Young Kim Mar 06 '22 at 01:27
  • 1
    @DOYoungKim, that's good to know. `nohup` closes standard input for the process by default. That might be causing a problem. Have you tried `nohup ... – pjh Mar 06 '22 at 01:42
  • 1
    @DOYoungKim, another option may be to use [tmux](https://en.wikipedia.org/wiki/Tmux) or [screen](https://en.wikipedia.org/wiki/GNU_Screen) instead of `nohup`. They're both excellent tools that make it possible to keep multiple interactive terminal sessions alive while disconnected from a machine and in a state that makes it possible to resume interacting with them afterwards. – pjh Mar 06 '22 at 01:48
  • @pjh I just tried out with – DO Young Kim Mar 06 '22 at 03:32
  • 1
    I would do something like that completely in bash, using `nohup` and `&` on each `gsutil`. I have had weird results combining python and bash. You could use `time` command as well `time gsutil ...`. – Nic3500 Mar 06 '22 at 03:57
  • 1
    @DOYoungKim, it should be `/dev/null`, not `/del/null`. If redirecting from `/dev/null` doesn't work then the Python program may be having problems working without being connected to a terminal. Using `tmux` or `screen` would avoid that problem. – pjh Mar 06 '22 at 03:57
  • Haven't had a chance to try tmux, screen, or /dev/null yet, but using subprocess.run() instead of os.system() works – DO Young Kim Mar 06 '22 at 04:58
  • @DOYoungKim Kindly post the answer that works for you so it can help the community. – Alex G Mar 07 '22 at 12:31

1 Answers1

1

Replacing os.system() to subprocess.run() worked as provided by @gonzalo-odiard. And you can replace it by following the subprocess section of the Python docs.

To know the difference of the two, check this SO post.

Alex G
  • 1,179
  • 3
  • 15