By setting PYTHONIOENCODING
in the environment, you're telling Python to not trust your terminal/OS's information regarding the encoding -- you're saying that you know better, and the terminal device actually accepts that encoding, not whatever the OS &c will tell Python.
So in this case you're saying that (whatever it claims otherwise) your terminal actually accepts and properly formats bytes in latin-1
.
That is probably not the case (if you don't set that environment variable what does sys.stdout.encoding
say? utf-8
, I guess?) so it's not surprising that you don't get the display you want:-).
On your specific question,
sys.getdefaultencoding()
tells you what encoding Python will use to translate between actual text (that is, Unicode) and byte strings, in situations where it has no other indication (I/O to stdin/stdout is not one of those situations, as it uses the encoding
attribute of those files).
>>> b = 'ÿ'
This has nothing to do with sys.stdin/stdout -- rather, your terminal is sending, after the open quote, some "escape sequence" that boils down to proper utf-8 (my Mac's Terminal app does, for example). If this was in a .py
file without a proper source-encoding preamble, it would be a syntax error -- the interactive interpreter has become a softy in 2.7.9:-)
>>> print '\xff'
� # Why this odd character? Shouldn't I get 'ÿ' always for the reason above?
You've told Python that your terminal accepts and properly displays latin-1 byte sequences (even though the terminal probably wants utf-8 ones and tells Python that, you've told Python to ignore what the terminal says about its encoding, or rather, what the OS says the terminal says:-).
So the byte of value 255 is sent as-is, and the terminal doesn't like it one bit (since the terminal doesn't actually accept latin-1!) and displays an error-marker.
Here's a typical example on my Mac (where the Terminal does actually accept 'utf-8'):
ozone:~ alex$ PYTHONIOENCODING=latin-1 python -c "print u'\xff'"
?
ozone:~ alex$ PYTHONIOENCODING=utf-8 python -c "print u'\xff'"
ÿ
ozone:~ alex$ python -c "print u'\xff'"
ÿ
Letting Python properly detect the terminal encoding on its own, or forcing it to what happens to be the right one, displays correctly.
Forcing the encoding to one the terminal does not in fact accept, unsurprisingly, does not display correctly.
Should you ever attach to your machine's serial port an ancient teletype which does in fact accept latin-1 (but the OS doesn't detect that fact properly), PYTHONIOENCODING
will help you properly do Python I/O on that ancient teletype. Otherwise, it's unlikely that said environment setting will be of much use to you:-).