5

I am trying to run a sql query in python. In python 2 this used to work but now that I am using python 3 this no longer is working.

I get error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1: invalid start byte

Update, added in the 3 lines in the middle,also tried using 'windows-1252' here. same error:

conn_str = 'DSN=PostgreSQL30'
conn = pyodbc.connect('DSN=STACK_PROD')


###newly added
    conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
    conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
    conn.setencoding(encoding='utf-8')

sql = "select * from stackoverflow where p_date = " + business_date
print("Query: " + sql)
crsr = conn.execute(sql)

traceback:

Traceback (most recent call last):

  File "<ipython-input-2-b6db3f5e859e>", line 1, in <module>
    runfile('//stack/overflow/create_extract_db_new.py', wdir='//stack/overflow')

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "//stack/overflow/create_extract_db_new.py", line 37, in <module>
    crsr = conn.execute(sql)

Also tried:

conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Also tried:

    conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
    conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16')
    conn.setencoding(encoding='utf-8')    

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Also tried:

conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
conn.setencoding(encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Also tried:

conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte

Dump Dsn Results, omitting my username, uid, password and server.:

[my_dsn]
Driver=C:\Program Files (x86)\psqlODBC\0903\bin\psqlodbc30a.dll
CommLog=0
Debug=0
Fetch=100
Optimizer=0
Ksqo=1
UniqueIndex=1
UseDeclareFetch=0
UnknownSizes=0
TextAsLongVarchar=1
UnknownsAsLongVarchar=0
BoolsAsChar=1
Parse=0
CancelAsFreeStmt=0
MaxVarcharSize=255
MaxLongVarcharSize=8190
ExtraSysTablePrefixes=dd_;
Description=my_dsn
Database=db_name
Port=9996
ReadOnly=0
ShowOidColumn=0
FakeOidIndex=0
RowVersioning=0
ShowSystemTables=0
Protocol=7.4
ConnSettings=
DisallowPremature=0
UpdatableCursors=1
LFConversion=1
TrueIsMinus1=0
BI=0
AB=0
ByteaAsLongVarBinary=0
UseServerSidePrepare=1
LowerCaseIdentifier=0
GssAuthUseGSS=0
SSLmode=disable
KeepaliveTime=-1
KeepaliveInterval=-1
PreferLibpq=-1
XaOpt=1

Error msg:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data

can anyone help me ?

excelguy
  • 1,574
  • 6
  • 33
  • 67
  • 1
    Possible duplicate of ['utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte](https://stackoverflow.com/questions/48067514/utf-8-codec-cant-decode-byte-0xa0-in-position-4276-invalid-start-byte) – Bram Vanroy Aug 07 '19 at 20:41
  • i looked at this, not sure wehre to put in the encoding. I cant put it in `conn.execute` as execute wont take this. – excelguy Aug 07 '19 at 20:57
  • See also https://stackoverflow.com/questions/40712310/unicodedecodeerror-on-python-3-5-when-doing-sql-query – TastySlowCooker Aug 07 '19 at 21:00
  • Please [edit] your question to show the complete stack trace. – Gord Thompson Aug 12 '19 at 14:47
  • Can you try with `conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16')` and rest on `utf-8` – Tarun Lalwani Aug 12 '19 at 14:56
  • @TarunLalwani Gettign the same error using : `conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16') conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8') conn.setencoding(encoding='utf-8')` – excelguy Aug 12 '19 at 15:15
  • error message is slightly different though : `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data` – excelguy Aug 12 '19 at 15:23
  • Do you have any PostgreSQL client tools on your machine, e.g., `psql` or `pgAdmin`, that you might use to generate CREATE DATABASE and CREATE TABLE scripts like [this](https://pastebin.com/VZrqK6fh) that could help you create a [mcve]? – Gord Thompson Aug 17 '19 at 23:14
  • Your code fragment is incomplete. Please make sure it's minimal but sufficient to reproduce the problem. For example, the import statements and the definition of `business_date` are missing. – hagello Aug 18 '19 at 21:29
  • Ever tried bound parameters? `conn.execute("select * from stackoverflow where p_date = ?", business_date)` – hagello Aug 18 '19 at 21:31
  • Why , what would this do? i'll give it a try. Going to add more code fragment shortly. – excelguy Aug 19 '19 at 13:09

2 Answers2

3

When using PostgreSQL's Unicode driver you need to call setencoding and setdecoding as explained here.

# Python 3.x
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
cnxn.setencoding(encoding='utf-8')

If you are using PostgreSQL's "ANSI" driver then you may still need to call those methods to ensure that the correct single-byte character set (a.k.a. "code page", e.g., windows-1252) is used for SQL_CHAR.

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • thank you, in this case I guess `conn` would replace `cncx`? – excelguy Aug 08 '19 at 14:44
  • Yes, it's the name of the `Connection` object created by `pyodbc.connect(...)` – Gord Thompson Aug 08 '19 at 15:24
  • Seems like i am getting the same error, check my updated code. – excelguy Aug 08 '19 at 15:26
  • @excelguy - What do you see if you `print(conn.getinfo(pyodbc.SQL_DRIVER_NAME) + ' version ' + conn.getinfo(pyodbc.SQL_DRIVER_VER))` ...? – Gord Thompson Aug 08 '19 at 16:44
  • `PSQLODBC.DLL version 09.03.0400` – excelguy Aug 08 '19 at 16:46
  • 09.03.0400 appears to be nearly five years old (2014-10-26). Can you update to the latest version (currently 11.01.0000)? – Gord Thompson Aug 08 '19 at 16:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/197691/discussion-between-excelguy-and-gord-thompson). – excelguy Aug 08 '19 at 17:09
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/197733/discussion-between-excelguy-and-gord-thompson). – excelguy Aug 09 '19 at 14:26
  • A bit of digging reveals that you are using the "PostgreSQL ANSI" driver, so perhaps you might try having your `setencoding` and `setdecoding` statements use `'windows-1252'` instead of `'utf-8'`. – Gord Thompson Aug 09 '19 at 22:02
  • Yes I am using that driver. I tried `'windows-1252'` but still error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data` – excelguy Aug 12 '19 at 14:05
  • Do you get an error if you do `crsr.columns('your_table_name').fetchall()` ? – Gord Thompson Aug 12 '19 at 15:55
  • Where in my code do i run this? When i run it in my console i get `NameError: name 'crsr' is not defined` . – excelguy Aug 12 '19 at 19:49
  • You run it instead of the SELECT that's failing, i.e., after you open the connection and do the `setencoding`/`setdecoding`. – Gord Thompson Aug 12 '19 at 19:59
  • I getn an error, i commented out my `sql` variable. and put `crsr.columns('stackoverflow').fetchall()` beneath my `crsr = conn.execute(sql)` and get error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 1: invalid continuation byte` – excelguy Aug 12 '19 at 20:24
  • Instead of `crsr = conn.execute(sql)` you need to do `crsr = conn.cursor()` followed by `crsr.columns('stackoverflow').fetchall()`. – Gord Thompson Aug 12 '19 at 20:27
  • Okay, now try running [this code](https://pastebin.com/1NUMT6Vm) to test retrieving each column individually. – Gord Thompson Aug 12 '19 at 20:32
  • dang they all showed "FAIL" . btw I kept your previous suggetions in the code aswell. – excelguy Aug 12 '19 at 20:59
  • Do they show different types, e.g., '(int4)', '(varchar)', etc.? – Gord Thompson Aug 12 '19 at 21:01
  • Yes, we have a few `int8` most are `varchar`, got a couple `float8` and `nurmeric`. – excelguy Aug 12 '19 at 21:03
  • Hmm, okay. What does `pyodbc.version` return? – Gord Thompson Aug 12 '19 at 21:04
  • It returns, '4.0.26' – excelguy Aug 13 '19 at 13:11
  • 4.0.27 is the latest version but there weren't any changes from 4.0.26 that would likely affect this issue. Can you use [dump_dsn](https://github.com/gordthompson/dump_dsn) to list your DSN settings, obfuscate *just* the `Servername`, `Username`, `UID`, and `Password` values and then add the result to your question? – Gord Thompson Aug 13 '19 at 13:35
  • do you mean include everything but my Servername, Username, UID, and Password ? Posting now. – excelguy Aug 13 '19 at 13:53
  • Yes. We know that you can establish a connection so your servername and credentials are probably not important. – Gord Thompson Aug 13 '19 at 14:03
  • All of your non-connection DSN settings are exactly the same as mine, and my connection works fine. If you are able to run "\Windows\SysWOW64\odbcad32.exe" can you create a User DSN for the "PostgreSQL Unicode" driver that uses all of the default settings (just specify `Database`, `Server`, `Port`, `User Name`, and `Password`) and then try running the [test code](https://pastebin.com/1NUMT6Vm) again with that DSN? – Gord Thompson Aug 13 '19 at 14:48
  • Okay i created a new user dsn(32 bit), using unicode instead of ansi this time. Changes my `pyodbc.connect` to the new dsn name, then ran your test. all columns show fail again. – excelguy Aug 13 '19 at 15:03
  • In addition to your existing `setdecoding` statements for `SQL_CHAR` and `SQL_WCHAR`, try adding `conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')` and see if that helps at all. – Gord Thompson Aug 13 '19 at 17:31
  • ah this error is what im still getting, ` crsr = conn.execute(sql) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 1: invalid continuation byte` I've also chanegd the `SQL_CHAR` to `WCHAR` on the first statement and made it `ufc-12` – excelguy Aug 13 '19 at 18:21
  • What's perplexing me is that AFAIK we've been trying to convince pyodbc to decode *everything* as `windows-1252` and yet you're still getting a UnicodeDecodeError stating that it can't decode the bytes as `utf-8`. (BTW, I have no idea what you mean by `ufc-12`.) Anyway, I'm pretty much out of ideas unless you can provide a sample database that can be used to reproduce the issue. Sorry. – Gord Thompson Aug 13 '19 at 18:50
  • i meant `'utf-16'` sorry. Thanks for all your help. – excelguy Aug 13 '19 at 19:48
  • If you suspect an issue with pyodbc itself you could create a [GitHub issue](https://github.com/mkleehammer/pyodbc/issues/) including a [mcve] and attach an [ODBC trace log](https://github.com/mkleehammer/pyodbc/wiki/Troubleshooting-%E2%80%93-Generating-an-ODBC-trace-log) to show the pyodbc developers how far the ODBC "conversation" proceeds before failing. – Gord Thompson Aug 13 '19 at 21:53
0

What worked for me was using this line for connectiong via odbc instead, and I took out the encoding and decoding.

con = pyodbc.connect(r'DSN='+'STACK_PROD',autocommit=True)

excelguy
  • 1,574
  • 6
  • 33
  • 67