13

I have some data files uploaded on my google drive. I want to import those files into google colab.

The REST API method and PyDrive method show how to create a new file and upload it on drive and colab. Using that, I am unable to figure out how to read the data files already present on my drive in my python code.

I am a total newbie to this. Can someone help me out?

user3828311
  • 907
  • 4
  • 11
  • 20

3 Answers3

15

(Update April 15 2018: The gspread is frequently being updated, so to ensure stable workflow I specify the version)

For spreadsheet file, the basic idea is using packages gspread and pandas to read spreadsheets in Drive and convert them to pandas dataframe format.

In the Colab notebook:

#install packages
!pip install gspread==2.1.1
!pip install gspread-dataframe==2.1.0
!pip install pandas==0.22.0


#import packages and authorize connection to Google account:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from google.colab import auth
auth.authenticate_user()  # verify your account to read files which you have access to. Make sure you have permission to read the file!
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default()) 

Then I know 3 ways to read Google spreadsheets.

By file name:

spreadsheet = gc.open("goal.csv") # Open file using its name. Use this if the file is already anywhere in your drive
sheet =  spreadsheet.get_worksheet(0)  # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()

By url:

 spreadsheet = gc.open_by_url('https://docs.google.com/spreadsheets/d/1LCCzsUTqBEq5pemRNA9EGy62aaeIgye4XxwReYg1Pe4/edit#gid=509368585') # use this when you have the complete url (the edit#gid means permission)
    sheet =  spreadsheet.get_worksheet(0)  # 0 means the first sheet in the file
    df2 = pd.DataFrame(sheet.get_all_records())
    df2.head()

By file key/ID:

spreadsheet = gc.open_by_key('1vpukIbGZfK1IhCLFalBI3JT3aobySanJysv0k5A4oMg') # use this when you have the key (the string in the url following spreadsheet/d/)
sheet =  spreadsheet.get_worksheet(0)  # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()

I shared the code above in a Colab notebook: https://drive.google.com/file/d/1cvur-jpIpoEN3vAO8Fd_yVAT5Qgbr4GV/view?usp=sharing

Source: https://github.com/burnash/gspread

MarshallMa
  • 171
  • 10
  • during installation here is got an error: ```ERROR: gspread-dataframe 3.0.3 has requirement gspread>=3.0.0, but you'll have gspread 2.1.1 which is incompatible.``` – Alexey Nikonov Oct 30 '19 at 11:16
2

!) Set your data to be publicly available then for public spreadsheets:

from StringIO import StringIO  # got moved to io in python3.

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc? 
key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates= 
['Quradate'])

In [11]: df.head()

More here: Getting Google Spreadsheet CSV into A Pandas Dataframe

If private data sort of the same but you will have to do some auth gymnastics...

dartdog
  • 10,432
  • 21
  • 72
  • 121
  • The data is not in google spreadsheets. It is a numpy array that has been saved to disk. How to approach that ? – user3828311 Feb 09 '18 at 18:32
  • 1
    I'd look at the same approach, save the file to drive and read it from a public url. But given some strange possible distortions I'd make the NP array into a pandas dataframe and save it as csv to the spreadsheet then follow the above,, I have not tried to save and read a np array even locally... – dartdog Feb 09 '18 at 21:16
  • This worked for me without StringIO, just `KEY = "0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc"; pd.read_csv("https://docs.google.com/spreadsheet/ccc?key=" + KEY + "&output=csv")` – Max Ghenis Feb 13 '22 at 03:56
0

From Google Colab snippets

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

worksheet = gc.open('Your spreadsheet name').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)

# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)
grahas
  • 35
  • 5