PostgreSQL ANSI,Python SQL, utf-8' codec can't decode byte 0xa0

Question

I am trying to run a sql query in python. In python 2 this used to work but now that I am using python 3 this no longer is working.

I get error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1: invalid start byte

Update, added in the 3 lines in the middle,also tried using 'windows-1252' here. same error:

conn_str = 'DSN=PostgreSQL30'
conn = pyodbc.connect('DSN=STACK_PROD')


###newly added
    conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
    conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
    conn.setencoding(encoding='utf-8')

sql = "select * from stackoverflow where p_date = " + business_date
print("Query: " + sql)
crsr = conn.execute(sql)

traceback:

Traceback (most recent call last):

  File "<ipython-input-2-b6db3f5e859e>", line 1, in <module>
    runfile('//stack/overflow/create_extract_db_new.py', wdir='//stack/overflow')

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "//stack/overflow/create_extract_db_new.py", line 37, in <module>
    crsr = conn.execute(sql)

Also tried:

conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Also tried:

    conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
    conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16')
    conn.setencoding(encoding='utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Also tried:

conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
conn.setencoding(encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Also tried:

conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Dump Dsn Results, omitting my username, uid, password and server.:

[my_dsn]
Driver=C:\Program Files (x86)\psqlODBC\0903\bin\psqlodbc30a.dll
CommLog=0
Debug=0
Fetch=100
Optimizer=0
Ksqo=1
UniqueIndex=1
UseDeclareFetch=0
UnknownSizes=0
TextAsLongVarchar=1
UnknownsAsLongVarchar=0
BoolsAsChar=1
Parse=0
CancelAsFreeStmt=0
MaxVarcharSize=255
MaxLongVarcharSize=8190
ExtraSysTablePrefixes=dd_;
Description=my_dsn
Database=db_name
Port=9996
ReadOnly=0
ShowOidColumn=0
FakeOidIndex=0
RowVersioning=0
ShowSystemTables=0
Protocol=7.4
ConnSettings=
DisallowPremature=0
UpdatableCursors=1
LFConversion=1
TrueIsMinus1=0
BI=0
AB=0
ByteaAsLongVarBinary=0
UseServerSidePrepare=1
LowerCaseIdentifier=0
GssAuthUseGSS=0
SSLmode=disable
KeepaliveTime=-1
KeepaliveInterval=-1
PreferLibpq=-1
XaOpt=1

Error msg:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data

can anyone help me ?

Possible duplicate of ['utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte](https://stackoverflow.com/questions/48067514/utf-8-codec-cant-decode-byte-0xa0-in-position-4276-invalid-start-byte) — Bram Vanroy, Aug 07 '19 at 20:41
i looked at this, not sure wehre to put in the encoding. I cant put it in `conn.execute` as execute wont take this. — excelguy, Aug 07 '19 at 20:57
See also https://stackoverflow.com/questions/40712310/unicodedecodeerror-on-python-3-5-when-doing-sql-query — TastySlowCooker, Aug 07 '19 at 21:00
Please [edit] your question to show the complete stack trace. — Gord Thompson, Aug 12 '19 at 14:47
Can you try with `conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16')` and rest on `utf-8` — Tarun Lalwani, Aug 12 '19 at 14:56
@TarunLalwani Gettign the same error using : `conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16') conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8') conn.setencoding(encoding='utf-8')` — excelguy, Aug 12 '19 at 15:15
error message is slightly different though : `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data` — excelguy, Aug 12 '19 at 15:23
Do you have any PostgreSQL client tools on your machine, e.g., `psql` or `pgAdmin`, that you might use to generate CREATE DATABASE and CREATE TABLE scripts like [this](https://pastebin.com/VZrqK6fh) that could help you create a [mcve]? — Gord Thompson, Aug 17 '19 at 23:14
Your code fragment is incomplete. Please make sure it's minimal but sufficient to reproduce the problem. For example, the import statements and the definition of `business_date` are missing. — hagello, Aug 18 '19 at 21:29
Ever tried bound parameters? `conn.execute("select * from stackoverflow where p_date = ?", business_date)` — hagello, Aug 18 '19 at 21:31
Why , what would this do? i'll give it a try. Going to add more code fragment shortly. — excelguy, Aug 19 '19 at 13:09

Gord Thompson · Answer 1 · 2019-08-20T11:23:59.770

3

When using PostgreSQL's Unicode driver you need to call setencoding and setdecoding as explained here.

# Python 3.x
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
cnxn.setencoding(encoding='utf-8')

If you are using PostgreSQL's "ANSI" driver then you may still need to call those methods to ensure that the correct single-byte character set (a.k.a. "code page", e.g., windows-1252) is used for SQL_CHAR.

edited Aug 20 '19 at 11:23

answered Aug 08 '19 at 14:39

Gord Thompson

116,920
32
215
418

thank you, in this case I guess `conn` would replace `cncx`? – excelguy Aug 08 '19 at 14:44
Yes, it's the name of the `Connection` object created by `pyodbc.connect(...)` – Gord Thompson Aug 08 '19 at 15:24
Seems like i am getting the same error, check my updated code. – excelguy Aug 08 '19 at 15:26
@excelguy - What do you see if you `print(conn.getinfo(pyodbc.SQL_DRIVER_NAME) + ' version ' + conn.getinfo(pyodbc.SQL_DRIVER_VER))` ...? – Gord Thompson Aug 08 '19 at 16:44
`PSQLODBC.DLL version 09.03.0400` – excelguy Aug 08 '19 at 16:46
09.03.0400 appears to be nearly five years old (2014-10-26). Can you update to the latest version (currently 11.01.0000)? – Gord Thompson Aug 08 '19 at 16:53
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/197691/discussion-between-excelguy-and-gord-thompson). – excelguy Aug 08 '19 at 17:09
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/197733/discussion-between-excelguy-and-gord-thompson). – excelguy Aug 09 '19 at 14:26
A bit of digging reveals that you are using the "PostgreSQL ANSI" driver, so perhaps you might try having your `setencoding` and `setdecoding` statements use `'windows-1252'` instead of `'utf-8'`. – Gord Thompson Aug 09 '19 at 22:02
Yes I am using that driver. I tried `'windows-1252'` but still error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data` – excelguy Aug 12 '19 at 14:05
Do you get an error if you do `crsr.columns('your_table_name').fetchall()` ? – Gord Thompson Aug 12 '19 at 15:55
Where in my code do i run this? When i run it in my console i get `NameError: name 'crsr' is not defined` . – excelguy Aug 12 '19 at 19:49
You run it instead of the SELECT that's failing, i.e., after you open the connection and do the `setencoding`/`setdecoding`. – Gord Thompson Aug 12 '19 at 19:59
I getn an error, i commented out my `sql` variable. and put `crsr.columns('stackoverflow').fetchall()` beneath my `crsr = conn.execute(sql)` and get error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 1: invalid continuation byte` – excelguy Aug 12 '19 at 20:24
Instead of `crsr = conn.execute(sql)` you need to do `crsr = conn.cursor()` followed by `crsr.columns('stackoverflow').fetchall()`. – Gord Thompson Aug 12 '19 at 20:27
Okay, now try running [this code](https://pastebin.com/1NUMT6Vm) to test retrieving each column individually. – Gord Thompson Aug 12 '19 at 20:32
dang they all showed "FAIL" . btw I kept your previous suggetions in the code aswell. – excelguy Aug 12 '19 at 20:59
Do they show different types, e.g., '(int4)', '(varchar)', etc.? – Gord Thompson Aug 12 '19 at 21:01
Yes, we have a few `int8` most are `varchar`, got a couple `float8` and `nurmeric`. – excelguy Aug 12 '19 at 21:03
Hmm, okay. What does `pyodbc.version` return? – Gord Thompson Aug 12 '19 at 21:04
It returns, '4.0.26' – excelguy Aug 13 '19 at 13:11
4.0.27 is the latest version but there weren't any changes from 4.0.26 that would likely affect this issue. Can you use [dump_dsn](https://github.com/gordthompson/dump_dsn) to list your DSN settings, obfuscate *just* the `Servername`, `Username`, `UID`, and `Password` values and then add the result to your question? – Gord Thompson Aug 13 '19 at 13:35
do you mean include everything but my Servername, Username, UID, and Password ? Posting now. – excelguy Aug 13 '19 at 13:53
Yes. We know that you can establish a connection so your servername and credentials are probably not important. – Gord Thompson Aug 13 '19 at 14:03
All of your non-connection DSN settings are exactly the same as mine, and my connection works fine. If you are able to run "\Windows\SysWOW64\odbcad32.exe" can you create a User DSN for the "PostgreSQL Unicode" driver that uses all of the default settings (just specify `Database`, `Server`, `Port`, `User Name`, and `Password`) and then try running the [test code](https://pastebin.com/1NUMT6Vm) again with that DSN? – Gord Thompson Aug 13 '19 at 14:48
Okay i created a new user dsn(32 bit), using unicode instead of ansi this time. Changes my `pyodbc.connect` to the new dsn name, then ran your test. all columns show fail again. – excelguy Aug 13 '19 at 15:03
In addition to your existing `setdecoding` statements for `SQL_CHAR` and `SQL_WCHAR`, try adding `conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')` and see if that helps at all. – Gord Thompson Aug 13 '19 at 17:31
ah this error is what im still getting, ` crsr = conn.execute(sql) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 1: invalid continuation byte` I've also chanegd the `SQL_CHAR` to `WCHAR` on the first statement and made it `ufc-12` – excelguy Aug 13 '19 at 18:21
What's perplexing me is that AFAIK we've been trying to convince pyodbc to decode *everything* as `windows-1252` and yet you're still getting a UnicodeDecodeError stating that it can't decode the bytes as `utf-8`. (BTW, I have no idea what you mean by `ufc-12`.) Anyway, I'm pretty much out of ideas unless you can provide a sample database that can be used to reproduce the issue. Sorry. – Gord Thompson Aug 13 '19 at 18:50
i meant `'utf-16'` sorry. Thanks for all your help. – excelguy Aug 13 '19 at 19:48
If you suspect an issue with pyodbc itself you could create a [GitHub issue](https://github.com/mkleehammer/pyodbc/issues/) including a [mcve] and attach an [ODBC trace log](https://github.com/mkleehammer/pyodbc/wiki/Troubleshooting-%E2%80%93-Generating-an-ODBC-trace-log) to show the pyodbc developers how far the ODBC "conversation" proceeds before failing. – Gord Thompson Aug 13 '19 at 21:53

score 0 · Accepted Answer · answered Aug 20 '19 at 13:07

0

What worked for me was using this line for connectiong via odbc instead, and I took out the encoding and decoding.

con = pyodbc.connect(r'DSN='+'STACK_PROD',autocommit=True)

answered Aug 20 '19 at 13:07

excelguy

1,574
6
33
67

PostgreSQL ANSI,Python SQL, utf-8' codec can't decode byte 0xa0

2 Answers2

Linked