It boils down to your output stream encoding. In this particular case, since you're using print
, the output file used is sys.stdout
.
Interactive mode / stdout
not redirected
When you run Python in the interactive mode, or when you don't redirect stdout
to a file, Python uses encoding based on the environment, namely locale environment variables, like LC_CTYPE
. For example, if you run your program like this:
$ LC_CTYPE='en_US' python test.py
...
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-4: ordinal not in range(128)
it will use ANSI_X3.4-1968
for sys.stdout
(see sys.stdout.encoding
) and fail. However, is you use UTF-8
(as you obviously already do):
$ LC_CTYPE='en_US.UTF-8' python test.py
1234567890
abcd
αβγδ
you'll get the expected output.
stdout
redirected to file
When you redirect stdout
to a file, Python will not try to detect encoding from your environment locale, but it will check another environment variable, PYTHONIOENCODING
(check the source, initstdio()
in Python/pylifecycle.c
). For example, this will work as expected:
$ PYTHONIOENCODING=utf-8 python test.py >/tmp/output
since Python will use UTF-8
encoding for /tmp/output
file.
Manual stdout
encoding override
You can also manually re-open sys.stdout
with the desired encoding (check this and this SO question):
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
Now print
will correctly output str
and unicode
objects, since the underlying stream writer will convert them to the UTF-8
on fly.
Manual string encoding before output
Of course, you can also manually encode each unicode
to UTF-8
str
prior to output with:
print ('%5s' % s2).encode('utf8')
but that's tedious and error-prone.
Explicit file open
For completeness: when opening files for writing with a specific encoding (like UTF-8) in Python 2, you should use either io.open
or codecs.open
because they allow you to specify the encoding (see this question), unlike the built-in open
:
from codecs import open
myfile = open('filename', encoding='utf-8')
or:
from io import open
myfile = open('filename', encoding='utf-8')