Python 3 character encoding issue

Question

i am selecting values from a MySQL // Maria DB that contains latin1 charset with latin1_swedish_ci collation. There are possible characters from different European language as Spanish ñ, German ä or Norwegian ø.

I get the data with

#!/usr/bin/env python3
# coding: utf-8

...
sql.execute("SELECT name FROM myTab")
for row in sql
 print(row[0])

There is an error message: UnicodeEncodeError: 'ascii' codec can't encode character '\xf1' Okay I have changed my print to

print(str(row[0].encode('utf8')))

and the result looks like this: b'\xc3\xb1' i looked at this Working with utf-8 encoding in Python source but i have declard the header. Also decode('utf8').encode('cp1250') does not help

thanks for supporting. this returnes `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 0` — Joe Platano, Jun 19 '17 at 23:23
Possible duplicate of [How to set sys.stdout encoding in Python 3?](https://stackoverflow.com/questions/4374455/how-to-set-sys-stdout-encoding-in-python-3) — Joe Platano, Jun 26 '17 at 21:23

score 3 · Accepted Answer · answered Jun 26 '17 at 21:22

3

okay the encoding issue has been solved finaly. Coldspeed gave a important hind with loacle. therefore all kudos for him! Unfortunately it was not that easy.

I found a workaround that fix the problem.

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)

The solution is from Jack O'Connor. posted in this answer:

answered Jun 26 '17 at 21:22

Joe Platano

586
1
14
27

2

+1 as this has allowed me to move forward. However, shouldn't this be written in flashing lights at the top of somewhere like https://docs.python.org/3/howto/unicode.html? My issue relates to using a jinja2 template. Where the template doesn't contain any unicode everything is OK, however, once there is a single unicode character somewhere in the template it breaks. My system locale is 'en_US.UTF-8' and no amount of encode/decode solved the problem. But the above just feels like such a fundamental thing that it cannot be the "correct way"? – Richard Corden Nov 15 '18 at 09:08
1

A thousand time this! How is this not the default in 2018 :/ – domenukk Mar 01 '19 at 18:24

score 1 · Answer 2 · answered Jun 19 '17 at 23:37

1

Python3 tries to automatically decode this string based on your locale settings. If your locale doesn't match up with the encoding on the string, you get garbled text, or it doesn't work at all. You can forcibly try encoding it with your locale and then decoding to cp1252 (it seems this is the encoding on the string).

print(row[0].encode('latin-1').decode('cp1252'))

answered Jun 19 '17 at 23:37

cs95

379,657
97
704
746

seems the point with locale directs to the goal. unfortunately your approach still does not brings the correct solution. But with locale i am getting closer. – Joe Platano Jun 21 '17 at 21:54
@JoePlatano what about `row[0].encode('latin-1').decode('utf-8')`? – cs95 Jun 21 '17 at 22:00
no does not work, well it does on shell if i exec the script as python script.py it works. On the webserver not. I added the following lines `print(sys.stdout.encoding)`and `print(sys.getdefaultencoding())` in shell there is utf-8 for both. if i execute the script on browser there is ANSI_X3.4-1968 for sys.stdout.encoding and utf-8 for sys.getdefaultencoding(). I think there is some locale issue on apache – Joe Platano Jun 21 '17 at 22:09
@JoePlatano Oh, I see... afraid I'm at a loss here. Hope you figure it out! You should try different encodings and see which works. – cs95 Jun 21 '17 at 22:10
1

yeah thanks anyway for pushing me in a good direction! Therefore the upvote. Thanks buddy – Joe Platano Jun 21 '17 at 22:12

Python 3 character encoding issue

2 Answers2