82

Setting the default output encoding in Python 2 is a well-known idiom:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

This wraps the sys.stdout object in a codec writer that encodes output in UTF-8.

However, this technique does not work in Python 3 because sys.stdout.write() expects a str, but the result of encoding is bytes, and an error occurs when codecs tries to write the encoded bytes to the original sys.stdout.

What is the correct way to do this in Python 3?

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 2to3 is a useful tool for questions like these. – dan_waterworth Dec 07 '10 at 08:25
  • @dan_waterworth: I didn't think of trying that before, but I just tried `2to3` now and it didn't suggest any changes for the given code. – Greg Hewgill Dec 07 '10 at 08:42
  • 3
    If the new code doesn't work then I'd suggest you add this as a bug. – dan_waterworth Dec 07 '10 at 10:10
  • 2
    Wow, this causes a lot of fun in an interactive shell - try `sys.stdout = codecs.getwriter("hex")(sys.stdout)` in `ipython` to see what I mean... – Tobias Kienzler Dec 09 '13 at 13:23
  • `PowerShell` redirection seems to re-encode everything to `UTF-16`, so if you're using redirection, you might need to use regular `cmd` instead. I verified `type foo.txt > foo2.txt` changes a `UTF-8` `foo.txt` to a `UTF-16` `foo2.txt`, so what Python writes to `stdout` isn't the last word. None of the solutions below worked for me because of this. – Terry Brown Nov 04 '19 at 17:32

7 Answers7

61

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

sth
  • 222,467
  • 53
  • 283
  • 367
  • 2
    What about if you are trying to maintain compatibility with Python 3.6? – Dan Mar 19 '20 at 13:21
  • 4
    @Dan Then you can't use this – sth Mar 19 '20 at 23:00
  • I had inferred as much. Is there not then in your knowledge an alternative solution? – Dan Mar 22 '20 at 21:07
  • 2
    @Dan Well on this very page there are a lot of other answers with alternative solutions. My answer is not the only answer on this question, there are other answers with other approaches from times before Python 3.7. Isn't that what you are looking for? – sth Mar 23 '20 at 13:19
  • I appreciate your responses. I did see several other answers on the page, but I considered your answer the most straightforward and elegant, and so I was wondering if you had a similarly straightforward/elegant alternative for python < 3.6. Thanks for your time. – Dan Mar 25 '20 at 18:20
  • 1
    I'm running Anaconda Python 3.8, and the statement "sys.stdout.reconfigure(encoding='utf-8')" generates an exception: "AttributeError: 'OutStream' object has no attribute 'reconfigure'" What am I missing? – Marc B. Hankin Sep 03 '21 at 18:47
  • 1
    `sys.stdout.buffer` is the untranslated stream. `sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)` will work. – Mark Tolonen Apr 12 '23 at 17:56
  • @MarcB.Hankin Some IDEs replace stdout with their own classes. For instance, IDLE replaces it with a `idlelib.run.StdOutputFile` object. You can protect the call with a `isinstance(std.stdout, io.TextIOWrapper)` test, or use `try ... except AttributeError:` around the statement, and rely on a IDE replaced stdout already supporting utf-8. – AJNeufeld Aug 01 '23 at 17:33
52

Python 3.1 added io.TextIOBase.detach(), with a note in the documentation for sys.stdout:

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach() streams can be made binary by default. This function sets stdin and stdout to binary:

def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
Community
  • 1
  • 1
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 6
    I'd use `PYTHONIOENCODING`; otherwise `io.TextIOWrapper` might be better alternative than `codecs` to handle newlines properly. – jfs Dec 21 '13 at 03:46
  • 5
    This totally changes the behavior of `sys.stdout`. The `StreamWriter` returned by `codecs.getwriter` is not line-buffered anymore, e.g.. – Sebastian Jun 08 '17 at 15:22
44

I found this thread while searching for solutions to the same error,

An alternative solution to those already suggested is to set the PYTHONIOENCODING environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout after Python is initialized:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.

ideasman42
  • 42,413
  • 44
  • 197
  • 320
37

Other answers seem to recommend using codecs, but open works for me:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii".

Jack O'Connor
  • 10,068
  • 4
  • 48
  • 53
  • This worked for me for dealing with an error caused by importing a module that I could not change. On a pretty vanilla Linux system that defaulted to LC_ALL = C, my program generated `'ascii' code can't encode character .... ordinal not in range(128)` when code from the imported module tried to print something. I could not rely on users of my program changing LC_ALL to 'en_US.UTF-8'. This hack solved it. I know it's an ugly approach, but I could not find another solution. – mhucka Jun 10 '18 at 21:33
19

Setting the default output encoding in Python 2 is a well-known idiom

Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.

CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write to send bytes directly. Encoding page content to match its charset parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print is no good for CGI any more.

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)

bobince
  • 528,062
  • 107
  • 651
  • 834
13

Using detach() causes the interpreter to print a warning when it tries to close stdout just before it exits:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

Instead, this worked fine for me:

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(And, of course, writing to default_out instead of stdout.)

ptomato
  • 56,175
  • 13
  • 112
  • 165
8

sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.

Where this would fail in Python 2:

>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

However, it works just dandy in Python 3:

>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7

Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.

Neuron
  • 5,141
  • 5
  • 38
  • 59
Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • 2
    My context was running the Python script as a CGI under Apache, where the default output encoding wasn't what I needed (I needed UTF-8). I think it's better for the script to ensure that its output is in the correct encoding, rather than relying on external settings (such as environment variables like PYTHONIOENCODING). – Greg Hewgill Dec 07 '10 at 10:03
  • 1
    Yet another proof that using stdout for process communication is big mistake. I realize you may have no choice than to use CGI in this case though so that's not your fault. :-) – Lennart Regebro Dec 07 '10 at 11:45
  • 2
    While it is true that `sys.stdout` is a *binary* file in Python 2 and a *text* file in Python 3, I think your Python 2 example fails because the unicode string `u"ûnicöde"` that gets implicitly encoded in the `sys.stdout.write` method has characters outside the ASCII range. If you change your `LC_CTYPE`, `LANG` or `PYTHONIOENCODING` environment variables to an encoding that has all the characters in the unicode string you should not get any error. (I have tried on Python 2.7.) – Géry Ogam Feb 22 '18 at 08:43