0

I used to have a working code. It reads a google sheet into pandas and then does stuff with the data.

It worked very well, until recently.

The google sheet has around 450 lines, but it seems the google api rejects the request now. Can anybody help with what is wrong here?

import pandas as pd

google_id = r"my_google_id"
front_boilerplate = r"https://docs.google.com/spreadsheets/d/"
back_boilerplate = r"/export?format=xlsx"

sheet_name  = front_boilerplate + google_id + back_boilerplate
df = pd.read_excel(sheet_name)

I get "Bad Request" as result. Which is strange, because last time I executed the code (a month ago) it worked perfectly well. Was there any change in pandas or the google API which leads to the rejection of this code?

For anybody wanting to see the stack trace:

Traceback (most recent call last):

  File "card_printer.py", line 1040, in <module>
    cards = read_cards(google_sheet)

  File "card_printer.py", line 90, in read_cards
    in_pd = read_excel(file, engine='openpyxl')

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
    io = ExcelFile(io, engine=engine)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
    self._reader = self._engines[engine](self._io)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_openpyxl.py", line 480, in __init__
    super().__init__(filepath_or_buffer)

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\excel\_base.py", line 342, in __init__
    filepath_or_buffer = BytesIO(urlopen(filepath_or_buffer).read())

  File "C:\Users\wagen\Anaconda3\lib\site-packages\pandas\io\common.py", line 137, in urlopen
    return urllib.request.urlopen(*args, **kwargs)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 531, in open
    response = meth(req, response)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 563, in error
    result = self._call_chain(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 531, in open
    response = meth(req, response)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)

  File "C:\Users\wagen\Anaconda3\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Bad Request
Anderas
  • 630
  • 9
  • 20
  • 1
    Does this answer your question? [wget/curl large file from google drive](https://stackoverflow.com/questions/25010369/wget-curl-large-file-from-google-drive) – Prayson W. Daniel Mar 07 '21 at 06:56
  • The above like has multiple ways to down. Check the one using Python. Have you tried downloading another file? – Prayson W. Daniel Mar 07 '21 at 06:57
  • 1
    In your case, if you are the owner of the Spreadsheet, when you change the shared condition to "Restricted" and click "Done" button, and then, when you shared publicly as "Anyone with the link" and test it again, what result will you obtain? If this was not the direct solution of your issue, when you copy the Spreadsheet and publicly share the copied Spreadsheet and test it again, what result will you obtain? If these were not the solution of your issue, I apologize. – Tanaike Mar 07 '21 at 07:24
  • 1
    You have to read a google sheet using gspread to authenticate and then make a DataFrame from that sheet. Making a DataFrame from a google sheets' url is trying to skip authentication and shouldn't work. – unltd_J Mar 07 '21 at 16:50
  • @Prayson W. Daniel, unfortunately not - I was using a supersimple pandas read excel; curl is an entirely different way of doing it. I might go down that route if nothing else works, though. Thanks for your answer! – Anderas Mar 07 '21 at 17:21
  • @Tanaike, thanks for suggesting ways of debugging it. Indeed, I was checking the share status; it is "public on the web" so normally anybody should be able to read it without authentification. I changed it to "anybody can comment" and then back to "anybody can view"... and now it works. It leaves me a little bit stupefied as per why, but it helped. :-) Thanks! – Anderas Mar 07 '21 at 17:31
  • @unltd_J well, that's interesting. So either, one can read-access google sheets without authentification, which would be an interesting bug... or google knows that it's me accessing my own sheet, which I could nearly believe. – Anderas Mar 07 '21 at 17:32
  • 1
    @Anderas Maybe since you're using jupyter it works because you're signed into your google account in browser. If not, I would assume this is some sort of bug. I'll add an answer on how I normally read google sheets for context. – unltd_J Mar 07 '21 at 18:08
  • Thank you for replying. I'm glad your issue was resolved. – Tanaike Mar 08 '21 at 00:24

2 Answers2

2

You can always use the API

jupyter notebook

from google import drive
mydrive = drive.drive()
mydrive.sheetdf(mydrive.findItem("Blood pressure",mime="application/vnd.google-apps.spreadsheet"),  range="Sheet1!A1:D" )

drive.py

from apiclient import discovery
from oauth2client import client, tools
from oauth2client.file import Storage
from pathlib import Path
import pandas as pd

# install pre-requisites
# pip3 install httplib2
# pip3 install google-api-python-client
# pip3 install oauth2client

# setup / download client secret file here
# https://console.developers.google.com/apis/credentials

SCOPES = ['https://www.googleapis.com/auth/spreadsheets', 'https://www.googleapis.com/auth/drive']

class drive(object):
    driveservice = None
    sheetservice = None

    def __init__(self, appname="googleclient") -> None:
        self.__connectToService(appname=appname)

    def __connectToService(self, appname=""):
        credentials = self.__get_credentials(appname=appname)
        self.driveservice = discovery.build('drive', 'v3', credentials=credentials)
        self.sheetservice = discovery.build('sheets', 'v4', credentials=credentials)

    def __get_credentials(self, appname=""):
        credential_dir = Path.home().joinpath('.credentials')
        if not credential_dir.exists(): credential_dir.mkdir()

        store = Storage(credential_dir.joinpath(f'{appname}.json'))
        credentials = store.get()
        if not credentials or credentials.invalid:
            cs = [p for p in Path.cwd().glob("**/client*.json")][0]
            flow = client.flow_from_clientsecrets(cs, SCOPES)
            flow.user_agent = appname
            # NB this fails with Safari, set Chrome as default for this to work
            credentials = tools.run_flow(flow, store, None)
        return credentials

    def findItem(self, name="", mime="application/vnd.google-apps.folder") -> str:
        name = name.replace("'", "\\'") if "'" in name else name
        id = []
        page_token = None
        while True:
            response = self.driveservice.files().list(
                q=f"name='{name}' and trashed=false and mimeType='{mime}'",
                spaces='drive',
                fields='nextPageToken, files(id, name, mimeType)',
                pageToken=page_token).execute()
            for file in response.get('files', []):
                id.append(file["id"])
            page_token = response.get('nextPageToken', None)
            if page_token is None:
                break

        if id == []:  # not found
            id.append("")

        return id[0]

    def sheetdf(self, id, range="Sheet1!A1:Z"):
        df = pd.DataFrame(self.sheetservice.spreadsheets().values().get(
            spreadsheetId=id, range=range
        ).execute()["values"])
        df.columns = df.iloc[0]
        df.drop(0, inplace=True)
        return df


if __name__ == '__main__':
    mydrive = drive()

    df = mydrive.sheetdf(mydrive.findItem(
        "Blood pressure",mime="application/vnd.google-apps.spreadsheet"),  range="Sheet1!A1:D" )
    print(df.tail(5))


Reference:

Iamblichus
  • 18,540
  • 2
  • 11
  • 27
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • Awesome Answer, thanks! I will use that way next, it's a much cleaner way than my own short one. Does this pave the way for editing a sheet from python, too? – Anderas Mar 07 '21 at 17:34
  • yes it does... `sheetservice.spreadsheets()` has bulk update methods too – Rob Raymond Mar 07 '21 at 18:09
1
import gspread
import pandas as pd


gc = gspread.service_account(filename='credentials.json')
book = gc.open_by_url(url)

You now have an authenticated connection to the google sheet you are trying to get a df from.

Each specific sheet in the spreadsheet can be accessed through the worksheet function.

sheet = book.worksheet('name of the sheet')

You can create a list of dictionaries from the data with a sheet by using this function

records = sheet.get_all_records()

And now a dataframe

df = pd.DataFrame(records)

There's a few steps to getting yourself a credentials file capable of authenticating. This video goes through them pretty clearly.

unltd_J
  • 476
  • 7
  • 14