0

I am new to Google Cloud Storage. In my python code, I have couple of Dataframes and I want to store them in a GCS bucket as a single excel file with multiple sheets. In local directory, I am able to do that with using ExcelWriter. Here is the code for that

writer = pd.ExcelWriter(filename)
dataframe1.to_excel(writer, 'sheet1', index=False)
dataframe2.to_excel(writer, 'sheet2', index=False)
writer.save()

I don't want to save a temp file in local directory and then upload it to GCS.

  • Hi Nishant, check out a couple of these threads. They may help https://stackoverflow.com/questions/36314797/write-a-pandas-dataframe-to-google-cloud-storage-or-bigquery https://stackoverflow.com/questions/56596951/save-pandas-data-frame-to-google-cloud-bucket. You also may want to look into using this library https://pypi.org/project/google-cloud-storage/. – jawsem May 08 '20 at 14:24
  • Hi @jawsem, Thanks for these links. But that solution is store CSV file into GCS. I want to store output as an excel file. – Nishant Igave May 09 '20 at 06:02
  • Looking into the [documentation](https://googleapis.dev/python/storage/latest/blobs.html) for Python client libraries for GCS I don't see an upload method suitable for your scenario. Perhaps, you could implement the removal of the local .xlsx file once it is uploaded to the GCS. – Deniss T. May 13 '20 at 08:31
  • @DenisT. Thanks for the documentation link. Currently, I am creating local file and then uploading it, but I want to see if there's any way to directly save it in the cloud. – Nishant Igave Jun 12 '20 at 06:20

1 Answers1

1

You can skip the use of gcsfs and directly use the ExcelWriter object with storage client:

import io
import pandas as pd
from google.cloud import storage

storage_client = storage.Client()
bucket = storage_client.bucket('name_of_your_bucket')
blob = bucket.blob('path/to/excel')
with io.BytesIO() as output:
    writer = pd.ExcelWriter(output, engine='xlsxwriter')
    dataframe1.to_excel(writer, sheet_name='sheet1', index=False)
    dataframe2.to_excel(writer, sheet_name='sheet2', index=False)
    writer.save()
    output.seek(0)
    blob.upload_from_file(output, content_type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")

OLD Answer:

You can instantiate your ExcelWriter() with engine=xlsxwriter and use fs-gcsfs to write the bytes array to excel file on your GCS bucket.

In your case you can do the following:

import io
import pandas as pd
from fs_gcsfs import GCSFS

gcsfs = GCSFS(bucket_name='name_of_your_bucket',
                      root_path='path/to/excel', 
#set a different root path if you wish to upload multiple files in different locations
                      strict=False)
gcsfs.fix_storage()

output = io.BytesIO()
writer = pd.ExcelWriter(output, engine='xlsxwriter')

dataframe1.to_excel(writer, sheet_name='sheet1', index=False)
dataframe2.to_excel(writer, sheet_name='sheet2', index=False)

writer.save()
xlsx_data = output.getvalue()

with gcsfs.open('./excel_file.xlsx', 'wb') as f:
  f.write(xlsx_data) 

PS: I had to use strict=False as fs-gcsfs wasn't able to locate the root path (Do check the limitations section in the documentation for fs-gcsfs)

Source: https://xlsxwriter.readthedocs.io/working_with_pandas.html#saving-the-dataframe-output-to-a-string

Sarath Gadde
  • 133
  • 7