0

i am playing around with string formatting. And actually i trying to understand the following piece of code:

mystring  = "\x80" * 50;
print mystring

output:

>>> 
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
>>>

the output is one string of Euro sings. But why is this like that? This is no ASCII afaik, and the other question i am asking myself is why does it not print out the hex \x80 ? Thanks in advance

Christian Berendt
  • 3,416
  • 2
  • 13
  • 22
Dirk
  • 451
  • 7
  • 21
  • There is a very thorough explanation about encoding of `\x80` by John Machin [here](http://stackoverflow.com/questions/2991660/python-postgresql-strange-ascii-utf8-encoding-error/2994258#2994258). – fhdrsdg Dec 02 '14 at 10:31

3 Answers3

2

As for the first question, \x80 Is interpreted as \u0080. A nice explanation can be found at Bytes in a unicode Python string.

Edit: @Joran Besley is right, so let me rephrase it:

u'\x80' is equal to u'\u0080'.

In fact:

unicode(u'\u0080')
>>> u'\x80'

and that's because Python < 3 prefers \x as escaping representation of Unicode characters when possible, that is as long as the code point is less than 256. After that it uses the normal \u:

unicode(u'\u2019')
>>> u'\u2019' # curved quotes in windows-1252

Where the character is then mapped depends on your terminal encoding. As Joran said, you are probably using Windows-1252 or something close to it, where the euro symbol is the hex byte 0x80. In iso-8898-15 for example the hex value is 0xa4:

"\xa4".decode("iso-8859-15") == "\x80".decode('windows-1252')
>>> True

If you are curious about your terminal encoding you can get it from sys

import sys
sys.stdin.encoding
>>> 'UTF-8' # my terminal
sys.stdout.encoding
>>> 'UTF-8' # same as above

I hope it makes up for my mistake.

Community
  • 1
  • 1
Germano
  • 2,452
  • 18
  • 25
1

It depends on your terminal encoding ... in the windows terminal that encodes to a bunch of C-cedilla's

if you want to see the "\x80" you can print repr(mystring)

furthermore 0x80 = 128 which is the (not ascii,since ascii only technically goes to 0x7f) value of the euro

specifically that is how "Windows-1252" encodes the euro sign (actually apparently thats how almost all the "Windows-125x" encode the euro sign)

this answer has lots more info

Hex representation of Euro Symbol €

furthermore you can convert it to unicode

unicode_ch = "\x80".decode("Windows-1252")  #it is now decoded into unicode
print repr(unicode_ch) # \u20AC  the unicode equivalent of Euro
print unicode_ch #as long as your terminal can handle it
Community
  • 1
  • 1
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Could you describe this a little bit more further? I mean okay when i print this on bash its just a block of squares. I am guessing this is unicode or something like that? – Dirk Jul 02 '14 at 17:53
1

A little tinkering in IDLE produced this output.

>>> a = "\x80"
>>> a
'\x80'
>>> print a * 50
€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€
>>> print a
€
>>> 

The first thing that stands out is the '\' character. This character is used for escaping characters in strings. You can learn about escaping characters in the link below.

http://en.wikipedia.org/wiki/Escape_character

Changing the string slightly tells us that escaping is occurring.

>>> print '\x8'
ValueError: invalid \x escape

What I think is happening is the escape is causing the string to be looked up in the ASCII (or similar) table.

Andrew
  • 114
  • 3