1

I'm trying to understand how PYTHONIOENCODING environment variable fits with Python2.7, so I tried the following things with the interactive prompt:

antox@antox-pc ~/Scrivania $ export PYTHONIOENCODING='latin1'
antox@antox-pc ~/Scrivania $ /usr/bin/python2.7 
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.stdin.encoding
'latin1'
>>> sys.stdout.encoding
'latin1'
>>> b = 'ÿ'
>>> b      
'\xc3\xbf'   #Shouldn't I get something like '\xff' because I set PYTHONIOENCODING to latin1? It looks as if utf-8 is been used instead
>>> print '\xff'
�            # Why this odd character? Shouldn't I get 'ÿ' always for the reason above?

My questions/doubts are indicated as comments.

zer0uno
  • 7,521
  • 13
  • 57
  • 86

1 Answers1

2

By setting PYTHONIOENCODING in the environment, you're telling Python to not trust your terminal/OS's information regarding the encoding -- you're saying that you know better, and the terminal device actually accepts that encoding, not whatever the OS &c will tell Python.

So in this case you're saying that (whatever it claims otherwise) your terminal actually accepts and properly formats bytes in latin-1.

That is probably not the case (if you don't set that environment variable what does sys.stdout.encoding say? utf-8, I guess?) so it's not surprising that you don't get the display you want:-).

On your specific question,

sys.getdefaultencoding()

tells you what encoding Python will use to translate between actual text (that is, Unicode) and byte strings, in situations where it has no other indication (I/O to stdin/stdout is not one of those situations, as it uses the encoding attribute of those files).

>>> b = 'ÿ'

This has nothing to do with sys.stdin/stdout -- rather, your terminal is sending, after the open quote, some "escape sequence" that boils down to proper utf-8 (my Mac's Terminal app does, for example). If this was in a .py file without a proper source-encoding preamble, it would be a syntax error -- the interactive interpreter has become a softy in 2.7.9:-)

>>> print '\xff'
�            # Why this odd character? Shouldn't I get 'ÿ' always for the reason above?

You've told Python that your terminal accepts and properly displays latin-1 byte sequences (even though the terminal probably wants utf-8 ones and tells Python that, you've told Python to ignore what the terminal says about its encoding, or rather, what the OS says the terminal says:-).

So the byte of value 255 is sent as-is, and the terminal doesn't like it one bit (since the terminal doesn't actually accept latin-1!) and displays an error-marker.

Here's a typical example on my Mac (where the Terminal does actually accept 'utf-8'):

ozone:~ alex$ PYTHONIOENCODING=latin-1 python -c "print u'\xff'"
?
ozone:~ alex$ PYTHONIOENCODING=utf-8 python -c "print u'\xff'"
ÿ
ozone:~ alex$ python -c "print u'\xff'"
ÿ

Letting Python properly detect the terminal encoding on its own, or forcing it to what happens to be the right one, displays correctly.

Forcing the encoding to one the terminal does not in fact accept, unsurprisingly, does not display correctly.

Should you ever attach to your machine's serial port an ancient teletype which does in fact accept latin-1 (but the OS doesn't detect that fact properly), PYTHONIOENCODING will help you properly do Python I/O on that ancient teletype. Otherwise, it's unlikely that said environment setting will be of much use to you:-).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • So if I understand when I type a character on my keyword it is translated in a stream of bytes according to the encoding of the os/terminal and then this stream is sent to python; python, according to PYTHONIOENCODING, reads those bytes using 'latin1' and viceversa. Is it right? – zer0uno Jan 31 '15 at 18:27
  • Right, but if the terminal+OS are actually sending sequences of bytes in utf-8, wrongly accepting them as if they were latin-1 ones instead is not going to be helpful. And viceversa for byte sequences your Python code sends to `sys.stdout`. – Alex Martelli Jan 31 '15 at 18:41
  • Ok, I think it is clear. B.T.W Is there a command that tells me what encoding is just the terminal using? – zer0uno Jan 31 '15 at 18:48
  • @antox, not in general nor with complete reliability, but, try the various answers to http://stackoverflow.com/questions/5306153/how-to-get-terminals-character-encoding -- and don't be surprised if they come out wrong when you attach that ancient TTY to the serial port, as your OS may well not support/detect it properly... – Alex Martelli Jan 31 '15 at 18:54