I've been trying to automate my GCP dataflow system. Uncompressed txt files are loaded into the pipeline much faster as compared to compressed gzip files because of some parallelization issue. So, I have to first convert my gzip files into txt files using gsutil commands in google interactive shell:
gsutil cat gs://nse-fao-data-test/FAO* | zcat | gsutil cp - gs://nse-fao-data-test/uncomp/hello9.txt
Now to automate the system, I try to run this gcloud shell in my local by giving OS command in python and call it every time before my pipeline begins:
import os
import subprocess
def uncompress(in_file = 'gs://nse-fao-data-test/FAO*',out_file="gs://nse-fao-data-test/uncomp/uncompressed.txt"):
subprocess.call("gsutil cat {0} | zcat | gsutil cp - {1}".format(in_file,out_file))
def openShell():
os.system("gcloud cloud-shell ssh --authorize-session")
While the openShell command works and starts gcloud shell in my local, but the uncompress does not execute. Is there any way I can automate command present in uncompress() function without writing it manually?