0

I try to get data from google speed sheet

from __future__ import print_function
import gspread
from oauth2client.client import SignedJwtAssertionCredentials
import pandas as pd
import numpy as np
from numpy import nan
import json

I do

SCOPE = ["https://spreadsheets.google.com/feeds"]
SECRETS_FILE = "D:.......json"
SPREADSHEET = "sheet"

and

json_key = json.load(open(SECRETS_FILE))
credentials = SignedJwtAssertionCredentials(json_key['client_email'], json_key['private_key'], SCOPE)
gc = gspread.authorize(credentials)
workbook = gc.open(SPREADSHEET)
sheet = workbook.sheet1
db = pd.DataFrame(sheet.get_all_records())
db.head()

and I get

----> 6 db = pd.DataFrame(sheet.get_all_records())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 2039: invalid continuation byte

I tried

import sys
from importlib import reload
reload(sys)
sys.setdefaultencoding("utf-8")
AttributeError: module 'sys' has no attribute 'setdefaultencoding'

I use

print (sys.version)
3.5.4 |Anaconda custom (64-bit)| (default, Aug 14 2017, 13:41:13) [MSC v.1900 64 bit (AMD64)]

And I tried

import os
os.environ["PYTHONIOENCODING"] = "utf-8"

with no result. I guess the problem can be in that I updated version Python

Edward
  • 4,443
  • 16
  • 46
  • 81
  • 1
    Your script is already using UTF-8 for decoding. The problem is that the data aren't encoded with UTF-8. – lenz May 19 '19 at 16:00
  • however I can not change UTF-8 – Edward May 19 '19 at 16:14
  • 1
    Start by opening the Google Sheet in Google Sheets and trying to work out what the character is. It isn't UTF-8 and there is nothing you can do in your code that is somehow going to make it UTF-8. It also isn't latin-1 or Windows-1252. – BoarGules May 19 '19 at 16:42
  • @BoarGules help to find. I have more than 7000 rows and 65 cols and i can not find `position 2039` – Edward May 19 '19 at 16:45
  • 1
    Then bisect your input successively to find the problem. That is, try with the first half of the input. If the problem persists, try with the first half of that, otherwise the second half. And so on. – BoarGules May 19 '19 at 16:48
  • 1
    @BoarGules it could be latin-1, or any other extended ascii encoding. In latin-1, character 0xd0 is Ð. It's not utf8 because a utf8 continuation byte must be in the range 0x80 to 0xBF. https://stackoverflow.com/questions/5552555/unicodedecodeerror-invalid-continuation-byte – Mic May 19 '19 at 18:20
  • @Mic So it could. – BoarGules May 19 '19 at 18:23
  • @BoarGules and the position number is not a unique place in the data, but some kind of pattern. But I can not find that – Edward May 19 '19 at 18:56
  • Now you're getting somewhere. Try opening the the file in a hex editor, maybe? – BoarGules May 19 '19 at 19:00
  • @BoarGules Problems are in quotes, & etc..... – Edward May 19 '19 at 19:21

0 Answers0