1. Get access to your client object.
Where is the code running?
I am (somewhere) inside the Google Cloud Platform (GCP)
If you are accessing Google Cloud Storage (GCS) from inside GCP, for example Google Kubernetes Engine (GKE), you should use a workload identity to configure your GKE service account to act as a GCS service account. https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
Once you do this creating your client is as easy as
import google.cloud.storage as gcs
client = gcs.Client()
Out in the wild
If you are somewhere else: AWS, Azure, your dev machine, or otherwise outside GCP, then you need to choose between creating a service account key that you download (it's a json file with a cryptographic PRIVATE KEY in it) or by using a workload identity federation, such as provided by AWS, Azure and "friends".
Let's assume you have decided to download the new GCS service account file to /secure/gcs.json
.
PROJECT_NAME = "MY-GCP-PROJECT"
from google.oauth2.service_account import Credentials
import google.cloud.storage as gcs
client = gcs.Client(
project=PROJECT_NAME,
credentials=Credentials.from_service_account_file("/secure/gcs.json"),
)
2. Make the list-folders request to GCS
In the OP, we are trying to get the folders inside path xyz
in bucket abc
. Note that paths in GCS, unlike Linux, do not start with a /
, however, they should finish with one. So we will be looking for folders with the prefix xyz/
. That is simply folders, not folders and all of their subfolders.
BUCKET_NAME = "abc"
blobs = client.list_blobs(
BUCKET_NAME,
prefix="xyz/", # <- you need the trailing slash
delimiter="/",
max_results=1,
)
Note how we have asked for no more than a single blob. This is not a mistake: the blobs are the files themselves - we're only interested in folders. Setting max_results
to zero doesn't work, see below.
3. Force the lazy-loading to...err..load!
Several of the answers up here have looped through every element in the iterator blobs
, which could me many millions, but we don't need to do that. That said, if we don't loop through any elements, blobs
won't bother making the api request to GCS at all.
next(blobs, ...) # Force blobs to load.
print(blobs.prefixes)
The blobs
variable is an iterator with at most one element, but, if your folder has no files in it (at its level) then there may be zero elements. If there are zero elements, then next(blobs)
will raise a StopIteration
.
The second argument, the ellipsis ...
, is simply my choice of default return value, should there be no next
element. I feel this is more readable than, say, None
, because it suggests to the reader that something worth noticing is happening here. After all, code that requests a value only to discard it on the same line does have all the hallmarks of a potential bug, so it is good to reassure our reader that this is deliberate.
Finally, suppose we have a tree structure under xyz
of aaa
, bbb
, ccc
, and then under ccc
we have subsubfolder zzz
. The output will then be
{'xyz/aaa', 'xyz/bbb', 'xyz/ccc'}
Note that, as required in OP, we do not see subsubfolder xyz/ccc/zzz
.