Since the accepted answer on this question didn't provide much detail - here's a modern solution using gsutil
that functions as described by that answer.
This becomes more effective than the other answers if you need to query your GCS files many times in your script.
def bucket_to_list(bucketname: str):
'''
Return bucket's contents to python list of strings.
We also slice off the bucket name on each line,
in case we need to search many buckets for one file.
'''
return subprocess.run(['gsutil','ls','-r', bucketname + '**'], shell=False, text=True, stdout=subprocess.PIPE).stdout.replace(bucketname, "").splitlines()
Use in the following way:
# call once for each bucket to store bucket contents
mybucket1 = 'gs://mybucket1/'
mybucket1list = bucket_to_list(mybucket1)
# limiting list to a bucket's "subdirectories"
mybucket2 = 'gs://mybucket2/subdir1/subdir2/'
mybucket2list = bucket_to_list(mybucket2)
# example filename list to check, we dont need to add the gs:// paths
filestocheck = ['file1.ext', 'file2.ext', 'file3.ext']
# check both buckets for files in our filelist
for file in filestocheck:
if file in mybucket1list:
# do something if file exists in bucket1
elif file in mybucket2list:
# do something if file exists in bucket2
else:
# do something if file doesn't exist in either bucket