21

I'm using Windows and Linux machines for the same project. The default encoding for stdin on Windows is cp1252, and on Linux it is utf-8.

I would like to change everything to utf-8. Is it possible? How can I do it?

This question is about Python 2; for Python 3, see Python 3: How to specify stdin encoding

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
duduklein
  • 10,014
  • 11
  • 44
  • 55

4 Answers4

19

You can do this by not relying on the implicit encoding when printing things. Not relying on that is a good idea in any case -- the implicit encoding is only used when printing to stdout and when stdout is connected to a terminal.

A better approach is to use unicode everywhere, and use codecs.open or codecs.getwriter everywhere. You wrap sys.stdout in an object that automatically encodes your unicode strings into UTF-8 using, for example:

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

This will only work if you use unicode everywhere, though. So, use unicode everywhere. Really, everywhere.

Thomas Wouters
  • 130,178
  • 23
  • 148
  • 122
  • 3
    stdin isn't decoded automatically, so you always have to do this yourself. And assuming the input is UTF-8 is probably a bad idea, but there's `codecs.getreader('utf-8')(sys.stdin)` if you really want to. – Thomas Wouters Apr 29 '10 at 21:44
  • Note that in contrast to Python 2, Python 3 actually automatically decodes stdin: http://docs.python.org/3/library/sys.html#sys.stdin -- this behavior can be changed as outlined in the docs. – Dr. Jan-Philip Gehrcke Feb 08 '14 at 18:00
  • 1
    Is there any way in Python 3 to forcibly change the encoding of STDIN regardless of the environment variables? – CMCDragonkai Jun 26 '18 at 01:47
  • In Python 3.8 `codecs.getreader('utf-8')(sys.stdin)` does not work. Use `codecs.getreader('utf-8')(sys.stdin.buffer)` and `codecs.getwriter('utf8')(sys.stdout.buffer)` instead. – Eponymous Mar 23 '20 at 02:46
18

This is an old question, but just for reference.

To read UTF-8 from stdin, use:

UTF8Reader = codecs.getreader('utf8')
sys.stdin = UTF8Reader(sys.stdin)

# Then, e.g.:
for _ in sys.stdin:
    print _.strip()

To write UTF-8 to stdout, use:

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)

# Then, e.g.:
print 'Anything'
Tomasz Nguyen
  • 2,561
  • 22
  • 25
  • In Python 3.8 `codecs.getreader('utf-8')(sys.stdin)` (equivalent to this post) does not work. Use `codecs.getreader('utf-8')(sys.stdin.buffer)` and `codecs.getwriter('utf8')(sys.stdout.buffer)` instead. – Eponymous Mar 23 '20 at 02:46
10

Python automatically detects the encoding of stdin. The simplest way I have found to specify an encoding when automatic detection isn't working properly is to use the PYTHONIOENCODING environment variable, as in the following example:

pipeline | PYTHONIOENCODING="UTF-8" /path/to/your-script.py

For more information about encoding detection and this variable on different platforms you can look at the sys.stdin documentation.

johnf
  • 263
  • 2
  • 8
0

A simple code snippet I used, which works for me on ubuntu: python2.7 and python3.6

from sys import version_info
if version_info.major == 2:  # for python2
    import codecs
    # for stdin
    UTF8Reader = codecs.getreader('utf8')
    sys.stdin = UTF8Reader(sys.stdin)
    # for stdout
    UTF8Writer = codecs.getwriter('utf8')
    sys.stdout = UTF8Writer(sys.stdout)
elif version_info.major == 3:  # for python3
    import codecs
    # for stdin
    UTF8Reader = codecs.getreader('utf8')
    sys.stdin = UTF8Reader(sys.stdin.buffer)
    # for stdout
    UTF8Writer = codecs.getwriter('utf8')
    sys.stdout = UTF8Writer(sys.stdout.buffer)
Tranfer Will
  • 118
  • 1
  • 5