5

I want to read an excel file stored in Azure blob storage to a python data frame. What method would I use?

dragonfromdreams
  • 129
  • 3
  • 12
  • You may refer to this [tutorial](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python#code-examples) and this [page](https://aakashkh.github.io/azure/python/2018/12/22/Connection-Azure-Blob.html) – Hury Shen Nov 13 '19 at 02:57

1 Answers1

8

There is a function named read_excel in the pandas package, which you can pass a url of an online excel file to the function to get the dataframe of the excel table, as the figure below.

enter image description here

So you just need to generate a url of a excel blob with sas token and then to pass it to the function.

Here is my sample code. Note: it requires to install Python packages azure-storage, pandas and xlrd.

# Generate a url of excel blob with sas token
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.blob import BlobPermissions
from datetime import datetime, timedelta

account_name = '<your storage account name>'
account_key = '<your storage key>'
container_name = '<your container name>'
blob_name = '<your excel blob>'

blob_service = BaseBlobService(
    account_name=account_name,
    account_key=account_key
)

sas_token = blob_service.generate_blob_shared_access_signature(container_name, blob_name, permission=BlobPermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
blob_url_with_sas = blob_service.make_blob_url(container_name, blob_name, sas_token=sas_token)

# pass the blob url with sas to function `read_excel`
import pandas as pd
df = pd.read_excel(blob_url_with_sas)
print(df)

I used my sample excel file to test the code below, it works fine.

Fig 1. My sample excel file testing.xlsx in my test container of Azure Blob Storage

enter image description here

Fig 2. The content of my sample excel file testing.xlsx

enter image description here

Fig 3. The result of my sample Python code to read excel blob

enter image description here

Peter Pan
  • 23,476
  • 4
  • 25
  • 43
  • This gives an http 400 error when trying to get blobs from nested files (ex. it works fine with folder/file but not folder/folder/file), do you know how to fix this? – dragonfromdreams Nov 21 '19 at 02:00
  • @dragonfromdreamsd Please create a new SO thread to post the details of your issue, and then I will help you. – Peter Pan Nov 21 '19 at 02:54
  • 1
    no such module: `from azure.storage.blob.baseblobservice import BaseBlobService` – mas Mar 24 '21 at 03:13
  • @mas Please check the version of Azure Storage SDK that be old v2 or new v12. – Peter Pan Mar 25 '21 at 10:51
  • 1
    thank Peter, the second solution in this post worked for me. [link](https://stackoverflow.com/questions/63825744/read-excel-data-from-azure-blob-and-convert-into-csv-using-python-azure-function) – mas Mar 25 '21 at 23:14