12

I am switched from Python 2.7 to Python 3.6.

I have scripts that deal with some non-English content.

I usually run scripts via Cron and also in Terminal.

I had UnicodeDecodeError in my Python 2.7 scripts and I solved by this.

# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

Now in Python 3.6, it doesnt work. I have print statements like print("Here %s" % (myvar)) and it throws error. I can solve this issue by replacing it to myvar.encode("utf-8") but I don't want to write with each print statement.

I did PYTHONIOENCODING=utf-8 in my terminal and I have still that issue.

Is there a cleaner way to solve UnicodeDecodeError issue in Python 3.6?

is there any way to tell Python3 to print everything in utf-8? just like I did in Python2?

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146

7 Answers7

26

It sounds like your locale is broken and have another bytes->Unicode issue. The thing you did for Python 2.7 is a hack that only masked the real problem (there's a reason why you have to reload sys to make it work).

To fix your locale, try typing locale from the command line. It should look something like:

LANG=en_GB.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=

locale depends on LANG being set properly. Python effectively uses locale to work out what encoding to use when writing to stdout in. If it can't work it out, it defaults to ASCII.

You should first attempt to fix your locale. If locale errors, make sure you've installed the correct language pack for your region.

If all else fails, you can always fix Python by setting PYTHONIOENCODING=UTF-8. This should be used as a last resort as you'll be masking problems once again.

If Python is still throwing an error after setting PYTHONIOENCODING then please update your question with the stacktrace. Chances are you've got an implied conversion going on.

Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
  • 5
    For future readers, please see our conversation in chat how @Alastair McCormack helped me solve my problem. https://chat.stackoverflow.com/rooms/173761/discussion-between-alastair-mccormack-and-umair – Umair Ayub Jun 25 '18 at 16:46
  • thanks for the answer and the discussion in the chat, It really helped me. I was facing a similar error in doing print('\u25cf') in python3. Setting the locale to en_US.utf8 helped. – code_dragon Apr 03 '20 at 03:51
  • Thanks! When I type `locale` it tuned out that all values are "POSIX"... – Hardwired Jun 09 '22 at 08:40
6

I had this issue when using Python inside a Docker container based on Ubuntu 18.04. It appeared to be a locale issue, which was solved by adding the following to the Dockerfile:

ENV LANG C.UTF-8
Daniel
  • 473
  • 4
  • 9
  • 2
    `export LANG=en_GB.UTF-8` before running python in console fix this temporarily – zhy Sep 13 '21 at 15:42
4

To everyone using pickle to load a file previously saved in python 2 and getting an UnicodeDecodeError, try setting pickle encoding parameter:

with open("./data.pkl", "rb") as data_file:
    samples = pickle.load(data_file, encoding='latin1')
Mark Storm
  • 51
  • 2
1

For a Python-only solution you will have to recreate your sys.stdout object:

import sys, codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())

After this, a normal print("hello world") should be encoded to UTF-8 automatically.

But you should try to find out why your terminal is set to such a strange encoding (which Python just tries to adopt to). Maybe your operating system is configured wrong somehow.

EDIT: In my tests unsetting the env variable LANG produced this strange setting for the stdout encoding for me:

LANG= python3
import sys
sys.stdout.encoding

printed 'ANSI_X3.4-1968'.

So I guess you might want to set your LANG to something like en_US.UTF-8. Your terminal program doesn't seem to do this.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • Is this the same or significantly different from `sys.stdout = io.TextIOWrapper(sys.stdout.buffer,encoding='utf8')`? (Which I am sure I have learned from this very website.) – Jongware Jun 25 '18 at 15:18
  • 1
    @usr2564301 I'm not sure about the difference (if any). The idea is clearly the same. A much nicer way would be the simple `sys.stdout.encoding = 'utf-8'` but that doesn't work, unfortunately, because the `encoding` field is readonly. But this is always the father of the thought and the two solutions are just different workaround implementations of it. – Alfe Jun 25 '18 at 15:22
-1

Python 3 (including 3.6) is already Unicode supported. Here is the doc - https://docs.python.org/3/howto/unicode.html

So you don't need to force Unicode support like Python 2.7. Try to run your code normally. If you get any error reading a Unicode text file you need to use the encoding='utf-8' parameter while reading the file.

dedsec
  • 59
  • 1
  • 9
-1

for docker with python3.6, use LANG=C.UTF-8 python or jupyter xxx works for me, thanks to @Daniel and @zhy

zhibo
  • 29
  • 5
-3

I mean you could write an custom function like this: (Not optimal i know)


import sys

def printUTF8(input):
    print(input.encode("utf-8"))
Jakob Sachs
  • 669
  • 4
  • 24