0

In GCS I have bucket XYZ, under that I have folder JM, under that I have files. For example:

XYZ/JM/file1.tar.gz,XYZ/JM/file2.tar.gz,XYZ/JM/file3.tar.gz,XYZ/JM/file4.tar.gz etc.

Using the code below I am able to list the files but its displaying the full path like:

JM/file1.tar.gz,JM/file2.tar.gz,JM/file3.tar.gz

Code:

from google.cloud import storage
storage_client = storage.Client.from_service_account_json()

BucketName="XYZ"
bucket=storage_client.get_bucket(BucketName)


filename=list(bucket.list_blobs(prefix="jm/"))
for name in filename:
       print(name.name)

Query: I want to list the files under folder JM. I don't want to display JM in the list, just display file ex: file1.tar.gz,file2.tar.gz

Daniel Ocando
  • 3,554
  • 2
  • 11
  • 19

1 Answers1

0

Everything in Cloud Storage is considered an object (even folders). Notice that as stated on the documentation:

To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.

and that is the reason why you receive the full object "path" which is actually the object's real name when using the list_blobs() method.

The prefix parameter of the list_blobs() method function you are using to filter the blobs should suffice to list the specific objects that you are looking for.

But afterwards you'd need to consider using a regex or a similar string splitting method by splitting with the '/' character to get just the portion of the blob's name that you consider relevant.

EDIT

I tested the following and it worked:

from google.cloud import storage
storage_client = storage.Client.from_service_account_json()

BucketName="XYZ"
bucket=storage_client.get_bucket(BucketName)


filename=list(bucket.list_blobs(prefix="jm/"))
for name in filename:
    try:
        prefix, object_name = name.name.split('/')
    except:
        print("An error occurred splitting the string.")
    print(object_name)
Daniel Ocando
  • 3,554
  • 2
  • 11
  • 19
  • That's true what you said all files folders are considered as objects. but still i wanted to know if there is other methods available or any parameters to trim that object value. – Raghavendra K Jul 07 '20 at 10:10
  • I tried using replace or trim function but didnt worked, will try using reg exp or split function as you said . Thanks for suggestion. – Raghavendra K Jul 07 '20 at 10:11
  • @RaghavendraK I added a code that uses the split function. I tested it in your specific setup and it's working correctly (you'll need to avoid using objects that have a `/` on their filename though. – Daniel Ocando Jul 07 '20 at 10:18
  • Thanks Daniel for the above code its worked & it is working as my expectation. As i am new to python just curious to know that "name" is of type "google.cloud.storage.blob.Blob" can we use other methods like replace or trim ? – Raghavendra K Jul 07 '20 at 10:43
  • You won't be able to use *string methods* such as [replace()](https://www.geeksforgeeks.org/python-string-replace/) or [strip()](https://www.programiz.com/python-programming/methods/string/strip) on the `name` variable itself. As you already know [it's a wrapper around Cloud Storage’s concept of an Object](https://googleapis.dev/python/storage/latest/blobs.html#google.cloud.storage.blob.Blob) and not a *string*. Nonetheless the `name` property of the `Blob` object is in fact a *string*, so you could use `name.name.replace()` or `name.name.strip()` as I did with the `split()` string method. – Daniel Ocando Jul 07 '20 at 10:55