Why does this string gets printed out like this?

Question

i am playing around with string formatting. And actually i trying to understand the following piece of code:

mystring  = "\x80" * 50;
print mystring

output:

>>> 
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
>>>

the output is one string of Euro sings. But why is this like that? This is no ASCII afaik, and the other question i am asking myself is why does it not print out the hex \x80 ? Thanks in advance

There is a very thorough explanation about encoding of `\x80` by John Machin [here](http://stackoverflow.com/questions/2991660/python-postgresql-strange-ascii-utf8-encoding-error/2994258#2994258). — fhdrsdg, Dec 02 '14 at 10:31

score 2 · Accepted Answer · edited May 23 '17 at 10:26

As for the first question, \x80 Is interpreted as \u0080. A nice explanation can be found at Bytes in a unicode Python string.

Edit: @Joran Besley is right, so let me rephrase it:

u'\x80' is equal to u'\u0080'.

In fact:

unicode(u'\u0080')
>>> u'\x80'

and that's because Python < 3 prefers \x as escaping representation of Unicode characters when possible, that is as long as the code point is less than 256. After that it uses the normal \u:

unicode(u'\u2019')
>>> u'\u2019' # curved quotes in windows-1252

Where the character is then mapped depends on your terminal encoding. As Joran said, you are probably using Windows-1252 or something close to it, where the euro symbol is the hex byte 0x80. In iso-8898-15 for example the hex value is 0xa4:

"\xa4".decode("iso-8859-15") == "\x80".decode('windows-1252')
>>> True

If you are curious about your terminal encoding you can get it from sys

import sys
sys.stdin.encoding
>>> 'UTF-8' # my terminal
sys.stdout.encoding
>>> 'UTF-8' # same as above

I hope it makes up for my mistake.

`u"\x80" != "\x80"` ... just a heads up ... your edited answer improves a great deal on the initial answer +1 — Joran Beasley, Jul 02 '14 at 21:14
I decided to mark this answer as correct, because it is a more detailed explanation.. — Dirk, Dec 02 '14 at 14:09

score 1 · Answer 2 · edited May 23 '17 at 12:11

It depends on your terminal encoding ... in the windows terminal that encodes to a bunch of C-cedilla's

if you want to see the "\x80" you can print repr(mystring)

furthermore 0x80 = 128 which is the (not ascii,since ascii only technically goes to 0x7f) value of the euro

specifically that is how "Windows-1252" encodes the euro sign (actually apparently thats how almost all the "Windows-125x" encode the euro sign)

this answer has lots more info

Hex representation of Euro Symbol €

furthermore you can convert it to unicode

unicode_ch = "\x80".decode("Windows-1252")  #it is now decoded into unicode
print repr(unicode_ch) # \u20AC  the unicode equivalent of Euro
print unicode_ch #as long as your terminal can handle it

Could you describe this a little bit more further? I mean okay when i print this on bash its just a block of squares. I am guessing this is unicode or something like that? — Dirk, Jul 02 '14 at 17:53

score 1 · Answer 3 · answered Jul 02 '14 at 17:59

A little tinkering in IDLE produced this output.

>>> a = "\x80"
>>> a
'\x80'
>>> print a * 50
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
>>> print a
€
>>>

The first thing that stands out is the '\' character. This character is used for escaping characters in strings. You can learn about escaping characters in the link below.

http://en.wikipedia.org/wiki/Escape_character

Changing the string slightly tells us that escaping is occurring.

>>> print '\x8'
ValueError: invalid \x escape

What I think is happening is the escape is causing the string to be looked up in the ASCII (or similar) table.

Why does this string gets printed out like this?

3 Answers3