4

I have written a small script which basically prints some info at windows terminal(which uses greek cp737 codepage). It's essentially sth like this:

while True:
    title = u'greek and other unichars follow:\t{}'.format(unicode_input())
    print title.encode('cp737','ignore')

which outputs:

greek and other unichars follow:    Καλημέρα!

which works as expected, terminal prints most of the greek letters and ignores rare exceptions which cant be translated to the much more constrained cp737.

Now in python3 when printing bytes, like u"unitext".encode(), outputs to stdout the bytes objects 'as-is':

b"greek and other unichars follow:\t\x89\x98\xa2\x9e\xa3\xe2\xa8\x98!"
  • Printing directly unicode in terminal will eventually lead to a UnicodeEncode error.

  • Converting unicode -> bytes(cp737,ignore) -> unicode, seems quirky.

So what is the elegant way of doing this?

Dim
  • 164
  • 7

1 Answers1

2

For Python 3, you have a few options available to you:

  1. Set the PYTHONIOENCODING environment variable to the encoding of your terminal. For example, you might set it to PYTHONIOENCODING=cp737:ignore. Then, if you print Unicode text using print, it will automatically be converted to cp737 charset and output properly.
  2. Reset the encoding of your sys.stdout at runtime. See this question: How to set sys.stdout encoding in Python 3?
  3. Write the encoded bytes directly to sys.stdout.buffer, which bypasses the encoding mechanism used by sys.stdout.
Community
  • 1
  • 1
nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • The linked answer uses `codecs.getwriter`; you can also `detach` and wrap the buffer with another `io.TextIOWrapper`. – Eryk Sun Jun 11 '14 at 00:24
  • sys.stdout.buffer.write(bytes) was closer to what I was looking for. It's also [documented](docs.python.org/3/library/sys.html#sys.stdout) (although 'restrictions may apply') Some discussion: bugs.python.org/issue18512 – Dim Jun 11 '14 at 04:18