Getting Google Spreadsheet CSV into A Pandas Dataframe

Question

I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it's native form could be read into a Pandas Dataframe. So now I use the following code to read the spreadsheet, works fine but just comes in as string,, and I'm not having any luck trying to get it back into a dataframe (you can get the data)

import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content

The data ends up looking like: (1st row headers)

',City,region,Res_Comm,mkt_type,Quradate,National_exp,Alabama_exp,Sales_exp,Inventory_exp,Price_exp,Credit_exp\n0,Dothan,South_Central-Montgomery-Auburn-Wiregrass-Dothan,Residential,Rural,1/15/2010,2,2,3,2,3,3\n10,Foley,South_Mobile-Baldwin,Residential,Suburban_Urban,1/15/2010,4,4,4,4,4,3\n12,Birmingham,North_Central-Birmingham-Tuscaloosa-Anniston,Commercial,Suburban_Urban,1/15/2010,2,2,3,2,2,3\n

The native pandas code that brings in the disk resident file looks like:

df = pd.io.parsers.read_csv('/home/tom/Dropbox/Projects/annonallanswerswithmaster1012013.csv',index_col=0,parse_dates=['Quradate'])

A "clean" solution would be helpful to many to provide an easy way to share datasets for Pandas use! I tried a bunch of alternative with no success and I'm pretty sure I'm missing something obvious again.

Just a Update note The new Google spreadsheet has a different URL pattern Just use this in place of the URL in the above example and or the below answer and you should be fine here is an example:

https://docs.google.com/spreadsheets/d/177_dFZ0i-duGxLiyg6tnwNDKruAYE-_Dd8vAQziipJQ/export?format=csv&id

see solution below from @Max Ghenis which just used pd.read_csv, no need for StringIO or requests...

the URL ends with `/edit?ts=5c0e311e#gid=0` and the sharing link ends with `/edit?usp=sharing`, none have `csv` and both give `404` when requested by pandas code — Mugen, Dec 11 '18 at 06:57

score 81 · Answer 1 · edited Sep 07 '16 at 06:41

81

Seems to work for me without the StringIO:

test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + 
                   '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
                   '/export?gid=0&format=csv',
                   # Set first column as rownames in data frame
                   index_col=0,
                   # Parse column values to datetime
                   parse_dates=['Quradate']
                  )
test.head(5)  # Same result as @TomAugspurger

BTW, including the ?gid= enables importing different sheets, find the gid in the URL.

edited Sep 07 '16 at 06:41

getup8

6,949
1
27
31

answered Feb 06 '16 at 20:23

Max Ghenis

14,783
16
84
132

1

Maybe just add comments as to what index_col and parse_dates do? Also, maybe this is obvious, but I think this only works if the Spreadsheet is public; I believe if it's not, you'll have to use the API. – getup8 Sep 07 '16 at 04:44
4

Great solution. Works when a sheet is shared as "Anyone on the Internet with this link can view". Note that `index_col` and `parse_dates` arguments are optional. – Dylan Hogg Jul 04 '20 at 11:24
3

it only works when the SPREADSHEET IS PUBLIC – Marco Cerliani Sep 09 '22 at 11:52

score 65 · Accepted Answer · edited Jan 04 '21 at 15:35

65

You can use read_csv() on a StringIO object:

from io import BytesIO

import requests
import pandas as pd

r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
    
In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])

In [11]: df.head()
Out[11]: 
          City                                            region     Res_Comm  \
0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
10       Foley                              South_Mobile-Baldwin  Residential   
12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
44      Athens                 North_Huntsville-Decatur-Florence  Residential   

          mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
0            Rural 2010-01-15 00:00:00             2            2          3   
10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
38           Rural 2010-01-15 00:00:00             3            3          3   
44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   

    Inventory_exp  Price_exp  Credit_exp  
0               2          3           3  
10              4          4           3  
12              2          2           3  
38              3          3           2  
44              4          4           4

edited Jan 04 '21 at 15:35

zabop

6,750
3
39
84

answered Oct 26 '13 at 21:02

TomAugspurger

28,234
8
86
69

1

I was looking for weeks how to import a spreadsheet into pandas. never heard of requests or StringIO libraries. Thank you!! – moldovean Apr 19 '14 at 11:40
Note the new URL format in the bottom of the original question above it is needed for the new Google spreadsheet version – dartdog Jun 06 '14 at 05:35
5

To clarify "got moved around in python3 if you're using that": from io import StringIO – ezcodr Jul 27 '14 at 05:45
Thanks! But I had to use this form of google url for csv output: http://stackoverflow.com/a/23702001/507544 – nealmcb Jun 03 '15 at 14:44
How can one specify the sheet (i.e. `#gid=x` in URL)? Adding it to the URL itself after `key=` didn't work. – Max Ghenis Feb 06 '16 at 20:08
see solution below! from @Max Ghenis – dartdog Feb 07 '16 at 16:32
Since support for Python 2 will be removed, I updated the code to something that's forward-compatible with Python 3. Basically, used BytesIO instead of StringIO. But Max's answer below mentioned that you don't actually need the IO buffers. – MrValdez Feb 10 '19 at 07:10

score 22 · Answer 3 · answered May 16 '18 at 17:48

22

Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.

sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'

First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:

csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')

Then we pass it to pd.read_csv, which can take a URL.

df = pd.read_csv(csv_export_url)

This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.

answered May 16 '18 at 17:48

Ken Arnold

1,942
1
20
24

This code returns a HTML page for download the csv, not the csv file from gsheet. – diegodsp Jul 15 '20 at 12:02
16

I am getting ParserError: Error tokenizing data. C error: Expected 1 fields in line 6, saw 2 – rsc05 Jan 31 '21 at 10:42
Did you make sure access is set to "anyone with the link" – Raisin Apr 16 '23 at 02:19

score 13 · Answer 4 · answered Jan 02 '18 at 14:37

13

My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!

gsheet = gs.open("Name")
Sheet_name ="today"
wsheet = gsheet.worksheet(Sheet_name)
dataframe = pd.DataFrame(wsheet.get_all_records())

answered Jan 02 '18 at 14:37

Abhery Guha

139
1
5

Nice..The interface keeps getting cleaner! – dartdog Jan 02 '18 at 15:06
5

just to clarify, `gs` would be `gs = gspread.authorize(credentials)` – RAbraham Mar 04 '19 at 20:05

Gianmario Spacagna · Answer 5 · 2018-02-26T10:46:32.043

I have been using the following utils and it worked so far:

def load_from_gspreadsheet(sheet_name, key):
    url = 'https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}&headers=1'.format(
        key=key, sheet_name=sheet_name.replace(' ', '%20'))

    log.info('Loading google spreadsheet from {}'.format(url))

    df = pd.read_csv(url)
    return df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)

You must specify the sheet_name and the key. The key is the string you get from the url in the following path: https://docs.google.com/spreadsheets/d/{key}/edit/.

You can change the value of headers if you have more than one row for the column names but I am not sure if it still work with multi-headers.

It may brake if Google will change their APIs.

Also please bear in mind that your spreadsheet must be public, everyone with the link can read it.

score 5 · Answer 6 · answered Dec 02 '21 at 03:08

5

First

Click on File
Select Publish to the web tab
Select which sheet you want as CSV(in case of multiple sheet) and also change the format from webpage to Comma-separated values
Click on publish
Copy the link eg: https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv

import pandas as pd
pd.read_csv("https://docs.google.com/spreadsheets/d/e/{}/pub?gid=0&single=true&output=csv")

answered Dec 02 '21 at 03:08

Parth chokhra

91
1
6

underrated but simple answer. – Shaida Muhammad Jan 20 '22 at 05:55
This worked for me, thanks! But this reads only the first sheet. How could I read all the sheets? – Sergey Belousov Feb 27 '22 at 15:42

score 4 · Answer 7 · answered Aug 13 '21 at 09:28

In Google Sheets file go to File > Publish to the web > Select .csv (see screenshot) > Copy link

Google Sheets: Publish to web

Code

import pandas as pd

path = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSvmELTzIjfSmX8GuV3HE2qomN3uRyvPX8RDzpw77JH33DUbj1bjech7H6NYPArvpZFux0DdJ5L5TKy/pub?output=csv'
data = pd.read_csv(path)
print(data)

Code in Google Colab

JQTs · Answer 8 · 2022-12-23T17:30:39.003

4

Straight to the point:

Get your google URL

https://docs.google.com/spreadsheets/d/ this is your sheet ID number/edit?gid=This will be your tab name, it will be a number. Each tab has its own

I like to make a function(not making here) so I separate my variables

sheet_id = "Place your sheet ID here"
sheet_name = "Place your sheet # here"

the next URL is the tricky part:

url = f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?gid={sheet_name}&format=csv"

Then just read it in

df = pd.csv(url)

That's it. If you need to select a different row as a header you can do this

df = pd.csv(url, header=1)

edited Dec 23 '22 at 17:30

answered Apr 13 '22 at 00:19

JQTs

142
2
11

1

use "df = pd.read_csv(url)" instead of "df = pd.csv(url)". – Robb Dunlap Mar 12 '23 at 16:00
Great point @RobbDunlap I can not remember why I used pd.csv and not put "read" in there. – JQTs Mar 13 '23 at 17:07

score 3 · Answer 9 · answered May 08 '18 at 01:34

If the csv file was shared via drive and not via spreadsheet then the below change to the url would work

#Derive the id from the google drive shareable link.
#For the file at hand the link is as below
#<https://drive.google.com/open?id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69>
file_id='1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
link='https://drive.google.com/uc?export=download&id={FILE_ID}'
csv_url=link.format(FILE_ID=file_id)
#The final url would be as below:-
#csv_url='https://drive.google.com/uc?export=download&id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
df = pd.read_csv(csv_url)

And the dataframe would be (if you just ran the above code)

    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9   10  11
3   12  13  14  15

See working code here.

score 3 · Answer 10 · answered Feb 11 '21 at 08:43

This works for me.

import pandas as pd

#Create a public URL
#https://docs.google.com/spreadsheets/d/0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc/edit?usp=sharing

#get spreadsheets key from url
gsheetkey = "0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc"

#sheet name
sheet_name = 'Sheet 1'

url=f'https://docs.google.com/spreadsheet/ccc?key={gsheetkey}&output=xlsx'
df = pd.read_excel(url,sheet_name=sheet_name)
print(df)

Getting Google Spreadsheet CSV into A Pandas Dataframe

10 Answers10

the next URL is the tricky part:

Then just read it in

Linked