I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)
Output is (changing angle brackets to square brackets for readability):
sys.stdout encoding is "cp1252" Traceback (most recent call last): File "TestPrintEncoding.py", line 22, in [module] print(str1) File "C:\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 4: character maps to [undefined]
Note that ü = '\xfc'
= 252
gives no problem since it's upper ASCII. But ā = '\u0101'
is beyond 8 bits.
Anyone have an idea how to change the encoding of sys.stdout
to 'utf-8'
? Bear in mind that Python 3.0 no longer uses the codecs
module, if I understand the documentation right.
(Note that the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!)