How to set sys.stdout encoding in Python 3?

Question

Setting the default output encoding in Python 2 is a well-known idiom:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

This wraps the sys.stdout object in a codec writer that encodes output in UTF-8.

However, this technique does not work in Python 3 because sys.stdout.write() expects a str, but the result of encoding is bytes, and an error occurs when codecs tries to write the encoded bytes to the original sys.stdout.

What is the correct way to do this in Python 3?

@dan_waterworth: I didn't think of trying that before, but I just tried `2to3` now and it didn't suggest any changes for the given code. — Greg Hewgill, Dec 07 '10 at 08:42
If the new code doesn't work then I'd suggest you add this as a bug. — dan_waterworth, Dec 07 '10 at 10:10
Wow, this causes a lot of fun in an interactive shell - try `sys.stdout = codecs.getwriter("hex")(sys.stdout)` in `ipython` to see what I mean... — Tobias Kienzler, Dec 09 '13 at 13:23
`PowerShell` redirection seems to re-encode everything to `UTF-16`, so if you're using redirection, you might need to use regular `cmd` instead. I verified `type foo.txt > foo2.txt` changes a `UTF-8` `foo.txt` to a `UTF-16` `foo2.txt`, so what Python writes to `stdout` isn't the last word. None of the solutions below worked for me because of this. — Terry Brown, Nov 04 '19 at 17:32

score 61 · Accepted Answer · answered Sep 17 '18 at 16:47

61

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

answered Sep 17 '18 at 16:47

sth

222,467
53
283
367

2

What about if you are trying to maintain compatibility with Python 3.6? – Dan Mar 19 '20 at 13:21
4

@Dan Then you can't use this – sth Mar 19 '20 at 23:00
I had inferred as much. Is there not then in your knowledge an alternative solution? – Dan Mar 22 '20 at 21:07
2

@Dan Well on this very page there are a lot of other answers with alternative solutions. My answer is not the only answer on this question, there are other answers with other approaches from times before Python 3.7. Isn't that what you are looking for? – sth Mar 23 '20 at 13:19
I appreciate your responses. I did see several other answers on the page, but I considered your answer the most straightforward and elegant, and so I was wondering if you had a similarly straightforward/elegant alternative for python < 3.6. Thanks for your time. – Dan Mar 25 '20 at 18:20
1

I'm running Anaconda Python 3.8, and the statement "sys.stdout.reconfigure(encoding='utf-8')" generates an exception: "AttributeError: 'OutStream' object has no attribute 'reconfigure'" What am I missing? – Marc B. Hankin Sep 03 '21 at 18:47
1

`sys.stdout.buffer` is the untranslated stream. `sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)` will work. – Mark Tolonen Apr 12 '23 at 17:56
@MarcB.Hankin Some IDEs replace stdout with their own classes. For instance, IDLE replaces it with a `idlelib.run.StdOutputFile` object. You can protect the call with a `isinstance(std.stdout, io.TextIOWrapper)` test, or use `try ... except AttributeError:` around the statement, and rely on a IDE replaced stdout already supporting utf-8. – AJNeufeld Aug 01 '23 at 17:33

score 52 · Answer 2 · edited Jun 20 '20 at 09:12

52

Python 3.1 added io.TextIOBase.detach(), with a note in the documentation for sys.stdout:

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach() streams can be made binary by default. This function sets stdin and stdout to binary:
def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 07 '10 at 07:59

Greg Hewgill

951,095
183
1,149
1,285

6

I'd use `PYTHONIOENCODING`; otherwise `io.TextIOWrapper` might be better alternative than `codecs` to handle newlines properly. – jfs Dec 21 '13 at 03:46
5

This totally changes the behavior of `sys.stdout`. The `StreamWriter` returned by `codecs.getwriter` is not line-buffered anymore, e.g.. – Sebastian Jun 08 '17 at 15:22

ideasman42 · Answer 3 · 2015-09-23T05:27:10.110

44

I found this thread while searching for solutions to the same error,

An alternative solution to those already suggested is to set the PYTHONIOENCODING environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout after Python is initialized:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.

edited Sep 23 '15 at 05:27

answered Oct 23 '11 at 07:53

ideasman42

42,413
44
197
320

11

Thumbs-upping mainly because PYTHONIOENCODING=utf-8 solved my problem, after many hours of searching. – theeggman85 Apr 09 '17 at 06:24

Jack O'Connor · Answer 4 · 2017-08-04T15:34:58.813

37

Other answers seem to recommend using codecs, but open works for me:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii".

edited Aug 04 '17 at 15:34

answered Nov 02 '15 at 02:57

Jack O'Connor

10,068
4
48
53

This worked for me for dealing with an error caused by importing a module that I could not change. On a pretty vanilla Linux system that defaulted to LC_ALL = C, my program generated `'ascii' code can't encode character .... ordinal not in range(128)` when code from the imported module tried to print something. I could not rely on users of my program changing LC_ALL to 'en_US.UTF-8'. This hack solved it. I know it's an ugly approach, but I could not find another solution. – mhucka Jun 10 '18 at 21:33

score 19 · Answer 5 · answered Dec 07 '10 at 11:23

Setting the default output encoding in Python 2 is a well-known idiom

Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.

CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write to send bytes directly. Encoding page content to match its charset parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print is no good for CGI any more.

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)

This comment needs to be updated, concerning 3.3 and upcoming 3.4 Python release. Thank you — soshial, Nov 01 '13 at 20:04

score 13 · Answer 6 · answered Jun 05 '15 at 18:43

Using detach() causes the interpreter to print a warning when it tries to close stdout just before it exits:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

Instead, this worked fine for me:

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(And, of course, writing to default_out instead of stdout.)

score 8 · Answer 7 · edited Apr 12 '23 at 15:14

8

sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.

Where this would fail in Python 2:

>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

However, it works just dandy in Python 3:

>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7

Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.

edited Apr 12 '23 at 15:14

Neuron

5,141
5
38
59

answered Dec 07 '10 at 09:44

Lennart Regebro

167,292
41
224
251

2

My context was running the Python script as a CGI under Apache, where the default output encoding wasn't what I needed (I needed UTF-8). I think it's better for the script to ensure that its output is in the correct encoding, rather than relying on external settings (such as environment variables like PYTHONIOENCODING). – Greg Hewgill Dec 07 '10 at 10:03
1

Yet another proof that using stdout for process communication is big mistake. I realize you may have no choice than to use CGI in this case though so that's not your fault. :-) – Lennart Regebro Dec 07 '10 at 11:45
2

While it is true that `sys.stdout` is a *binary* file in Python 2 and a *text* file in Python 3, I think your Python 2 example fails because the unicode string `u"ûnicöde"` that gets implicitly encoded in the `sys.stdout.write` method has characters outside the ASCII range. If you change your `LC_CTYPE`, `LANG` or `PYTHONIOENCODING` environment variables to an encoding that has all the characters in the unicode string you should not get any error. (I have tried on Python 2.7.) – Géry Ogam Feb 22 '18 at 08:43

How to set sys.stdout encoding in Python 3?

7 Answers7

Linked

Related