As for the first question, \x80
Is interpreted as \u0080
. A nice explanation can be found at Bytes in a unicode Python string.
Edit:
@Joran Besley is right, so let me rephrase it:
u'\x80'
is equal to u'\u0080'
.
In fact:
unicode(u'\u0080')
>>> u'\x80'
and that's because Python < 3 prefers \x
as escaping representation of Unicode characters when possible, that is as long as the code point is less than 256. After that it uses the normal \u
:
unicode(u'\u2019')
>>> u'\u2019' # curved quotes in windows-1252
Where the character is then mapped depends on your terminal encoding. As Joran said, you are probably using Windows-1252
or something close to it, where the euro symbol is the hex byte 0x80. In iso-8898-15
for example the hex value is 0xa4:
"\xa4".decode("iso-8859-15") == "\x80".decode('windows-1252')
>>> True
If you are curious about your terminal encoding you can get it from sys
import sys
sys.stdin.encoding
>>> 'UTF-8' # my terminal
sys.stdout.encoding
>>> 'UTF-8' # same as above
I hope it makes up for my mistake.