reading xlsx to Pandas from Google Drive without downloading it?

Question

I am trying to read an xlsx file into a pandas df without downloading it but I am having issues.

I have tried read_excel while parsing the URL with sheet_id but encountered the following error that blocks me.

url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/export"
file_df = pd.read_excel(url, engine='openpyxl')

error:

zipfile.BadZipFile: File is not a zip file

I have a service account as authentication(should be enough) and can not wish to use a bearer token as a solution like this suggests.

Any help would be much appreciated.

What you're trying to load is a google spreadsheet, not an xlsx (Excel) file. They can be read as csv. see https://stackoverflow.com/questions/19611729/getting-google-spreadsheet-csv-into-a-pandas-dataframe — Tranbi, Jun 15 '23 at 10:55
No I'm trying to load an excel not a gsheet. I know my above approach is incorrect — Solrac, Jun 15 '23 at 12:04
OK. It seems you have to export it anyway. Try the code in my posted answer. — Tranbi, Jun 15 '23 at 12:27

score 1 · Answer 1 · answered Jun 15 '23 at 12:26

1

Since your spreadsheet is hosted on google, you should export it to the desired format. Either csv:

df = pd.read_csv(f"https://docs.google.com/spreadsheets/export?id={sheet_id}&format=csv")

Or xlsx:

df = pd.read_excel(f"https://docs.google.com/spreadsheets/export?id={sheet_id}&format=xlsx")

answered Jun 15 '23 at 12:26

Tranbi

The second options results in the same error as the one I mentioned above. The first option returns a df object but it seems to be all messed up. The headers and records are what appears to be HTML. – Solrac Jun 15 '23 at 13:05
To add to this I am sharing the file with the service account user, so auth shouldn't be the issue – Solrac Jun 15 '23 at 13:11
1

did you specify `format=xlsx` at the end of the url for the second option? Regarding the messed up headers, I can't reproduce it. Maybe you can share a sample? – Tranbi Jun 15 '23 at 13:14

1 Answers1