12

How can I import MS-excel(.xlsx) file from google drive into colaboratory?

excel_file = drive.CreateFile({'id':'some id'})

does work(drive is a pydrive.drive.GoogleDrive object). But,

print excel_file.FetchContent()

returns None. And

excel_file.content()

throws:

TypeErrorTraceback (most recent call last) in () ----> 1 excel_file.content()

TypeError: '_io.BytesIO' object is not callable

My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.

dd_rookie
  • 331
  • 2
  • 3
  • 13

5 Answers5

10

You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.

Here's a full example: https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC

What I did in more detail:

I created a new spreadsheet in sheets to be exported as an .xlsx file.

Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is: https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM

Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.

Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:

file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')

Finally, to create a Pandas DataFrame:

!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df

The !pip install... line installs the xlrd library, which is needed to read Excel files.

Bob Smith
  • 36,107
  • 11
  • 98
  • 91
  • Since you're already uploading to Google Drive and converting to a Sheet then you could skip the step of uploading it again and access it from Colab simply as a Sheet as seen in this solution: https://stackoverflow.com/a/49397059/1762493 – Mikeumus Apr 17 '18 at 20:28
9

Perhaps a simpler method:

#To read/write data from Google Drive:
#Reference: https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveAå
from google.colab import drive
drive.mount('/content/drive')

df = pd.read_excel('/content/drive/My Drive/folder_name/file_name.xlsx')

# #When done, 
# drive.flush_and_unmount()
# print('All changes made in this colab session should now be visible in Drive.')

Victor Z
  • 717
  • 1
  • 8
  • 12
4

First, I import io, pandas and files from google.colab

import io
import pandas as pd
from google.colab import files

Then I upload the file using an upload widget

uploaded = files.upload()

You will something similar to this (click on Choose Files and upload the xlsx file): enter image description here

Let's suppose that the name of the files is my_spreadsheet.xlsx, so you need to use it in the following line:

df = pd.read_excel(io.BytesIO(uploaded.get('my_spreadsheet.xlsx')))

And that's all, now you have the first sheet in the df dataframe. However, if you have multiple sheets you can change the code into this:

First, move the io call to another variable

xlsx_file = io.BytesIO(uploaded.get('my_spreadsheet.xlsx'))

And then, use the new variable to specify the sheet name, like this:

df_first_sheet = pd.read_excel(xlsx_file, 'My First Sheet')
df_second_sheet = pd.read_excel(xlsx_file, 'My Second Sheet')
neosergio
  • 452
  • 5
  • 15
3
import pandas as pd

xlsx_link = 'https://docs.google.com/spreadsheets/d/1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM/export'
df = pd.read_excel(xlsx_link)

if the xlsx is hosted on Google drive, once shared, anyone can use link to access it, with or without google account. google.colab.drive or google.colab.files dependencies are not necessary

willhyper
  • 96
  • 2
  • 5
0

Easiest way I found so far.

Pretty similar to what we do on desktop.

Considering you uploaded the file to your Google Drive folder:

  • On the left bar click on Files ( below the {x} )
  • Select Mount Driver > drive > folder > file (left click and Copy Path)

After that just go to the code and past the path

pd.read_excel('/content/drive/MyDrive/Colab Notebooks/token_rating.xlsx')