How to check if file exists in Google Cloud Storage?

Question

I have a script where I want to check if a file exists in a bucket and if it doesn't then create one.

I tried using os.path.exists(file_path) where file_path = "/gs/testbucket", but I got a file not found error.

I know that I can use the files.listdir() API function to list all the files located at a path and then check if the file I want is one of them. But I was wondering whether there is another way to check whether the file exists.

+1 ran into this myself. We eventually wound up doing an HTTP HEAD on the public address of the file, but that's not a general solution. — ckhan, Nov 23 '12 at 08:41

nickyfot · Answer 1 · 2021-02-12T17:32:34.483

74

This post is old, you can actually now check if a file exists on GCP using the blob class, but because it took me a while to find an answer, adding here for the others who are looking for a solution

from google.cloud import storage

name = 'file_i_want_to_check.txt'   
storage_client = storage.Client()
bucket_name = 'my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)

Documentation is here

Hope this helps!

Edit

As per the comment by @om-prakash, if the file is in a folder, then the name should include the path to the file:

name = "folder/path_to/file_i_want_to_check.txt"

edited Feb 12 '21 at 17:32

answered Jul 03 '18 at 12:30

nickyfot

1,932
17
25

5

Above solution may not work if file exist in some folder in google cloud storage and not in root directory of cloud storage, do this instead `stats = storage.Blob(bucket=bucket, name="folder_1/another_folder_2/your_file.txt").exists(storage_client)` – Om Prakash Jun 13 '19 at 07:33
Thank you! This is exactly what I needed. – David Valenzuela Urrutia Aug 07 '20 at 16:55
1

it will also not work if the blob is a folder and not a file. – confiq May 04 '21 at 21:49
the link has been moved: https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob#google_cloud_storage_blob_Blob_exists – m25 Sep 16 '22 at 16:15

score 35 · Answer 2 · answered Nov 08 '18 at 09:48

35

It's as easy as use the exists method within a blob object:

from google.cloud import storage

def blob_exists(projectname, credentials, bucket_name, filename):
   client = storage.Client(projectname, credentials=credentials)
   bucket = client.get_bucket(bucket_name)
   blob = bucket.blob(filename)
   return blob.exists()

answered Nov 08 '18 at 09:48

javinievas

488
4
5

2

For thousands of URLs this is slow. Is there anyway to submit a batch of key/buckets in one go? – Adam Hughes Apr 14 '20 at 19:05
also seems error prone if the file is large (anecdotal). `urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError(0, 'Error'))` – s2t2 Aug 13 '20 at 13:53

score 15 · Answer 3 · answered May 07 '20 at 15:31

The answer provided by @nickthefreak is correct, and so is the comment by Om Prakash. One other note is that the bucket_name should not include gs:// in front or a / at the end.

Piggybacking off @nickthefreak's example and Om Prakash's comment:

from google.cloud import storage

name = 'folder1/another_folder/file_i_want_to_check.txt'   

storage_client = storage.Client()
bucket_name = 'my_bucket_name'  # Do not put 'gs://my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)

stats will be a Boolean (True or False) depending on whether the file exists in the Storage Bucket.

(I don't have enough reputation points to comment, but I wanted to save other people some time because I wasted way too much time with this).

This should be the accepted answer. `.exists()` does not require that extra arg. — Tudor, Dec 29 '20 at 14:58

score 8 · Answer 4 · answered Jul 18 '19 at 16:20

If you are looking for a solution in NodeJS, then here it is:

var storage = require('@google-cloud/storage')();
var myBucket = storage.bucket('my-bucket');

var file = myBucket.file('my-file');

file.exists(function(err, exists) {});

// If the callback is omitted, then this function return a Promise.
file.exists().then(function(data) {
  var exists = data[0];
});

If you need more help, you can refer to this doc: https://cloud.google.com/nodejs/docs/reference/storage/1.5.x/File#exists

OP specifically asked for Python – Dave Liu Feb 10 '21 at 22:09 — Dave Liu, Feb 10 '21 at 22:09

Tobias Ernst · Answer 5 · 2019-11-06T16:11:53.797

4

If you're working with gcs files on a service like "Google AI Platform", use tensorflow to check whether a file exists or not:

import tensorflow as tf
file_exists = tf.gfile.Exists('gs://your-bucket-name/your-file.txt')

edited Nov 06 '19 at 16:11

answered Apr 08 '19 at 15:53

Tobias Ernst

4,214
1
32
30

score 3 · Answer 6 · answered Sep 19 '17 at 04:18

You can use the stat function to get a files info. This will in practice do a HEAD request to google cloud storage instead of a GET, which is a bit less resource intensive.

import cloudstorage as gcs
# return stat if there is one, else None or false. A stat record should be truthy
def is_file_available(filepath):

  try:
    return gcs.stat(filepath)
  except gcs_errors.NotFoundError as e:
    return False

score 2 · Answer 7 · answered Jul 19 '18 at 10:30

The file I am searching on google cloud storage: init.sh

Full path: gs://cw-data/spark_app_code/init.sh

>>> from google.cloud import storage

>>> def is_exist(bucket_name,object):
...     client = storage.Client()
...     bucket = client.bucket(bucket_name)
...     blob = bucket.get_blob(object)
...     try:
...             return blob.exists(client)
...     except:
...             return False
...
>>> is_exist('cw-data','spark_app_code')
    False
>>> is_exist('cw-data','spark_app_code/')
    True
>>> is_exist('cw-data','init.sh')
    False
>>> is_exist('cw-data','spark_app_code/init.sh')
    True
>>> is_exist('cw-data','/init.sh')
    False
>>>

Here, the files are not stored in the way they are stored on local filesystems rather they are stored as keys. So, while searching the file on google storage use absolute path rather than just filename.

score 1 · Answer 8 · answered Feb 27 '17 at 18:38

1

Slight variation on Amit's answer from a few years ago, updated for the cloudstorage api.

import cloudstorage as gcs

def GCSExists(gcs_file):
    '''
    True if file exists; pass complete /bucket/file
    '''
    try:
        file = gcs.open(gcs_file,'r')
        file.close()
        status = True
    except:
        status = False
    return status

answered Feb 27 '17 at 18:38

Matthew Dunn

135
5

```import cloudstorage as gcs gcs.open("gs://foo/foo.bar") AttributeError: module 'cloudstorage' has no attribute 'open'``` – Jeremy Leipzig May 11 '21 at 16:12

score 1 · Answer 9 · answered Jul 05 '18 at 12:43

1

Yes! It posible! from this

And this is my code:

def get_by_signed_url(self, object_name, bucket_name=GCLOUD_BUCKET_NAME):
    bucket = self.client_storage.bucket(bucket_name)
    blob = bucket.blob(object_name)

    #this is check if file exist or not
    stats = blob.exists(self.client_storage)
    if not stats:
        raise NotFound(messages.ERROR_NOT_FOUND)

    url_lifetime = self.expiration  # Seconds in an hour
    serving_url = blob.generate_signed_url(url_lifetime)
    return self.session.get(serving_url)

answered Jul 05 '18 at 12:43

Ardi Nusawan

427
6
12

Though there is a file in the bucket this always returns 'False' for me. – AKs Jul 19 '18 at 09:42
@AjitK'sagar me not. If file exist gcs will return url of file. Maybe your url is incorrect? – Ardi Nusawan Jul 20 '18 at 11:54

lys · Answer 10 · 2020-11-25T02:49:47.127

Since the accepted answer on this question didn't provide much detail - here's a modern solution using gsutil that functions as described by that answer.

This becomes more effective than the other answers if you need to query your GCS files many times in your script.

def bucket_to_list(bucketname: str):
    '''
    Return bucket's contents to python list of strings. 
    We also slice off the bucket name on each line, 
    in case we need to search many buckets for one file.
    '''
    return subprocess.run(['gsutil','ls','-r', bucketname + '**'], shell=False, text=True, stdout=subprocess.PIPE).stdout.replace(bucketname, "").splitlines()

Use in the following way:

# call once for each bucket to store bucket contents 
mybucket1 = 'gs://mybucket1/'
mybucket1list = bucket_to_list(mybucket1)

# limiting list to a bucket's "subdirectories"
mybucket2 = 'gs://mybucket2/subdir1/subdir2/'
mybucket2list = bucket_to_list(mybucket2)

# example filename list to check, we dont need to add the gs:// paths 
filestocheck = ['file1.ext', 'file2.ext', 'file3.ext']

# check both buckets for files in our filelist
for file in filestocheck:
    if file in mybucket1list:
        # do something if file exists in bucket1
    elif file in mybucket2list:
        # do something if file exists in bucket2
    else:
        # do something if file doesn't exist in either bucket

I'm not sure that is "modern solution". If you are using python, it's smarter to use GCP API with it. — confiq, May 04 '21 at 21:47

score 1 · Answer 11 · answered May 11 '21 at 06:52

from google.cloud import storage

def if_file_exists(name:str,bucket_name:str):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    stats = storage.Blob.from_string(f"gs://{bucket_name}/{name}").exists(storage_client)
    return stats

print(if_file_exists('audios/courses/ActivityPlaying/1320210506130438.wav',GC_BUCKET_NAME),">>>")

name args is the remaining path of the file

if_file_exists function takes two positional args first one is the object key and second is the bucket name and return true if file exists else false

this is useful if you want to accept full file path as an input argument and want to validate if the file exists — Akhil, Mar 12 '23 at 21:23

score 0 · Accepted Answer · answered Nov 30 '12 at 11:58

I guess there is no function to check directly if the file exists given its path.
I have created a function that uses the files.listdir() API function to list all the files in the bucket and match it against the file name that we want. It returns true if found and false if not.

score 0 · Answer 13 · edited Jan 03 '13 at 08:08

You can use custom function (shown below) to check file exists or not

def is_file_available(filepath):
 #check if the file is available
 fileavability = 'yes';
 try: 
  fp = files.open(filepath, 'r')
  fp.close()
 except Exception,e:
  fileavability = 'no'
 return fileavability

use the above function in following way

 filepath = '/gs/test/testme.txt'
 fileavability = is_file_available(filepath)

note: in above function you may get also result as 'no' when read permission is not given to the application which is trying to read the file.

How to check if file exists in Google Cloud Storage?

13 Answers13

Edit

Linked