50

I have a script where I want to check if a file exists in a bucket and if it doesn't then create one.

I tried using os.path.exists(file_path) where file_path = "/gs/testbucket", but I got a file not found error.

I know that I can use the files.listdir() API function to list all the files located at a path and then check if the file I want is one of them. But I was wondering whether there is another way to check whether the file exists.

ivanleoncz
  • 9,070
  • 7
  • 57
  • 49
Tanvir Shaikh
  • 651
  • 2
  • 7
  • 10
  • +1 ran into this myself. We eventually wound up doing an HTTP HEAD on the public address of the file, but that's not a general solution. – ckhan Nov 23 '12 at 08:41

13 Answers13

74

This post is old, you can actually now check if a file exists on GCP using the blob class, but because it took me a while to find an answer, adding here for the others who are looking for a solution

from google.cloud import storage

name = 'file_i_want_to_check.txt'   
storage_client = storage.Client()
bucket_name = 'my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)

Documentation is here

Hope this helps!

Edit

As per the comment by @om-prakash, if the file is in a folder, then the name should include the path to the file:

name = "folder/path_to/file_i_want_to_check.txt"
nickyfot
  • 1,932
  • 17
  • 25
  • 5
    Above solution may not work if file exist in some folder in google cloud storage and not in root directory of cloud storage, do this instead `stats = storage.Blob(bucket=bucket, name="folder_1/another_folder_2/your_file.txt").exists(storage_client)` – Om Prakash Jun 13 '19 at 07:33
  • Thank you! This is exactly what I needed. – David Valenzuela Urrutia Aug 07 '20 at 16:55
  • 1
    it will also not work if the blob is a folder and not a file. – confiq May 04 '21 at 21:49
  • the link has been moved: https://cloud.google.com/python/docs/reference/storage/latest/google.cloud.storage.blob.Blob#google_cloud_storage_blob_Blob_exists – m25 Sep 16 '22 at 16:15
35

It's as easy as use the exists method within a blob object:

from google.cloud import storage

def blob_exists(projectname, credentials, bucket_name, filename):
   client = storage.Client(projectname, credentials=credentials)
   bucket = client.get_bucket(bucket_name)
   blob = bucket.blob(filename)
   return blob.exists()
javinievas
  • 488
  • 4
  • 5
  • 2
    For thousands of URLs this is slow. Is there anyway to submit a batch of key/buckets in one go? – Adam Hughes Apr 14 '20 at 19:05
  • also seems error prone if the file is large (anecdotal). `urllib3.exceptions.ProtocolError: ('Connection aborted.', OSError(0, 'Error'))` – s2t2 Aug 13 '20 at 13:53
15

The answer provided by @nickthefreak is correct, and so is the comment by Om Prakash. One other note is that the bucket_name should not include gs:// in front or a / at the end.

Piggybacking off @nickthefreak's example and Om Prakash's comment:

from google.cloud import storage

name = 'folder1/another_folder/file_i_want_to_check.txt'   

storage_client = storage.Client()
bucket_name = 'my_bucket_name'  # Do not put 'gs://my_bucket_name'
bucket = storage_client.bucket(bucket_name)
stats = storage.Blob(bucket=bucket, name=name).exists(storage_client)

stats will be a Boolean (True or False) depending on whether the file exists in the Storage Bucket.

(I don't have enough reputation points to comment, but I wanted to save other people some time because I wasted way too much time with this).

TalkDataToMe
  • 161
  • 1
  • 4
8

If you are looking for a solution in NodeJS, then here it is:

var storage = require('@google-cloud/storage')();
var myBucket = storage.bucket('my-bucket');

var file = myBucket.file('my-file');

file.exists(function(err, exists) {});

// If the callback is omitted, then this function return a Promise.
file.exists().then(function(data) {
  var exists = data[0];
});

If you need more help, you can refer to this doc: https://cloud.google.com/nodejs/docs/reference/storage/1.5.x/File#exists

Akash Kaushik
  • 107
  • 2
  • 3
4

If you're working with gcs files on a service like "Google AI Platform", use tensorflow to check whether a file exists or not:

import tensorflow as tf
file_exists = tf.gfile.Exists('gs://your-bucket-name/your-file.txt')
Tobias Ernst
  • 4,214
  • 1
  • 32
  • 30
3

You can use the stat function to get a files info. This will in practice do a HEAD request to google cloud storage instead of a GET, which is a bit less resource intensive.

import cloudstorage as gcs
# return stat if there is one, else None or false. A stat record should be truthy
def is_file_available(filepath):

  try:
    return gcs.stat(filepath)
  except gcs_errors.NotFoundError as e:
    return False
Mark
  • 3,459
  • 1
  • 18
  • 23
2

The file I am searching on google cloud storage: init.sh

Full path: gs://cw-data/spark_app_code/init.sh

>>> from google.cloud import storage

>>> def is_exist(bucket_name,object):
...     client = storage.Client()
...     bucket = client.bucket(bucket_name)
...     blob = bucket.get_blob(object)
...     try:
...             return blob.exists(client)
...     except:
...             return False
...
>>> is_exist('cw-data','spark_app_code')
    False
>>> is_exist('cw-data','spark_app_code/')
    True
>>> is_exist('cw-data','init.sh')
    False
>>> is_exist('cw-data','spark_app_code/init.sh')
    True
>>> is_exist('cw-data','/init.sh')
    False
>>>

Here, the files are not stored in the way they are stored on local filesystems rather they are stored as keys. So, while searching the file on google storage use absolute path rather than just filename.

AKs
  • 1,727
  • 14
  • 18
1

Slight variation on Amit's answer from a few years ago, updated for the cloudstorage api.

import cloudstorage as gcs

def GCSExists(gcs_file):
    '''
    True if file exists; pass complete /bucket/file
    '''
    try:
        file = gcs.open(gcs_file,'r')
        file.close()
        status = True
    except:
        status = False
    return status
Matthew Dunn
  • 135
  • 5
  • ```import cloudstorage as gcs gcs.open("gs://foo/foo.bar") AttributeError: module 'cloudstorage' has no attribute 'open'``` – Jeremy Leipzig May 11 '21 at 16:12
1

Yes! It posible! from this

And this is my code:

def get_by_signed_url(self, object_name, bucket_name=GCLOUD_BUCKET_NAME):
    bucket = self.client_storage.bucket(bucket_name)
    blob = bucket.blob(object_name)

    #this is check if file exist or not
    stats = blob.exists(self.client_storage)
    if not stats:
        raise NotFound(messages.ERROR_NOT_FOUND)

    url_lifetime = self.expiration  # Seconds in an hour
    serving_url = blob.generate_signed_url(url_lifetime)
    return self.session.get(serving_url)
Ardi Nusawan
  • 427
  • 6
  • 12
1

Since the accepted answer on this question didn't provide much detail - here's a modern solution using gsutil that functions as described by that answer.

This becomes more effective than the other answers if you need to query your GCS files many times in your script.

def bucket_to_list(bucketname: str):
    '''
    Return bucket's contents to python list of strings. 
    We also slice off the bucket name on each line, 
    in case we need to search many buckets for one file.
    '''
    return subprocess.run(['gsutil','ls','-r', bucketname + '**'], shell=False, text=True, stdout=subprocess.PIPE).stdout.replace(bucketname, "").splitlines()

Use in the following way:

# call once for each bucket to store bucket contents 
mybucket1 = 'gs://mybucket1/'
mybucket1list = bucket_to_list(mybucket1)

# limiting list to a bucket's "subdirectories"
mybucket2 = 'gs://mybucket2/subdir1/subdir2/'
mybucket2list = bucket_to_list(mybucket2)

# example filename list to check, we dont need to add the gs:// paths 
filestocheck = ['file1.ext', 'file2.ext', 'file3.ext']

# check both buckets for files in our filelist
for file in filestocheck:
    if file in mybucket1list:
        # do something if file exists in bucket1
    elif file in mybucket2list:
        # do something if file exists in bucket2
    else:
        # do something if file doesn't exist in either bucket 
lys
  • 949
  • 2
  • 9
  • 33
1

from google.cloud import storage

def if_file_exists(name:str,bucket_name:str):
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    stats = storage.Blob.from_string(f"gs://{bucket_name}/{name}").exists(storage_client)
    return stats

print(if_file_exists('audios/courses/ActivityPlaying/1320210506130438.wav',GC_BUCKET_NAME),">>>")

name args is the remaining path of the file

if_file_exists function takes two positional args first one is the object key and second is the bucket name and return true if file exists else false

sadab khan
  • 181
  • 3
  • 4
  • 1
    this is useful if you want to accept full file path as an input argument and want to validate if the file exists – Akhil Mar 12 '23 at 21:23
0

I guess there is no function to check directly if the file exists given its path.
I have created a function that uses the files.listdir() API function to list all the files in the bucket and match it against the file name that we want. It returns true if found and false if not.

Tanvir Shaikh
  • 651
  • 2
  • 7
  • 10
0

You can use custom function (shown below) to check file exists or not

def is_file_available(filepath):
 #check if the file is available
 fileavability = 'yes';
 try: 
  fp = files.open(filepath, 'r')
  fp.close()
 except Exception,e:
  fileavability = 'no'
 return fileavability 
use the above function in following way
 filepath = '/gs/test/testme.txt'
 fileavability = is_file_available(filepath)

note: in above function you may get also result as 'no' when read permission is not given to the application which is trying to read the file.

j0k
  • 22,600
  • 28
  • 79
  • 90
Amit Vikram
  • 394
  • 2
  • 15