1

I have to read google bucket files which are in xlsx format. The file structure in the bucket look like

bucket_name
       folder_name_1
               file_name_1
       folder_name_2
       folder_name_3
                file_name_3

The python snippet looks like

def main():
    storage_client = storage.Client.from_service_account_json(
        Constants.GCP_CRENDENTIALS)
    bucket = storage_client.bucket(Constants.GCP_BUCKET_NAME)

    blob = bucket.blob(folder_name_2 + '/' + Constants.GCP_FILE_NAME)

    data_bytes = blob.download_as_bytes()

    df = pd.read_excel(data_bytes, engine='openpyxl')
    print(df)

def function1():
     print("no file in the folder") # sample error

In the above snippet, I'm trying to open folder_name_2, it returns an error because there's no file to read.

Instead of throwing an error, I need to use function1 to print the error whenever there's no file in any folder.

Any ideas of doing this?

2 Answers2

0

I'm not familiar with the GCP API, but you're going to want to do something along the lines of this:

try:
    blob = bucket.blob(folder_name_2 + '/' + Constants.GCP_FILE_NAME)
    data_bytes = blob.download_as_bytes()
except Exception as e:
    print(e)

https://docs.python.org/3/tutorial/errors.html#handling-exceptions

George Vince
  • 179
  • 1
  • 2
  • 7
0

I'm not sure to understand what is your final goal, but an other logic is to list available resources in the bucket, and process it.

First, let's define a fonction that will list the available resources in a Bucket. You can add a prefix if you want to limit the research to a sub folder inside the Bucket.

def list_resource(client, bucket_name, prefix=''):
    path_files = []
    for blob in client.list_blobs(bucket_name, prefix=prefix):
        path_files.append(blob.name)
    return path_files

Now you can process your xlsx files:

for resource in list_resource(storage_client, Constants.GCP_BUCKET_NAME):
    if '.xlsx' in resource:
        print(resource)
        # Load blob and process your xlsx file