0

I have the following Unicode text stored in variable:

 myvariable = 'Gen\xe8ve'

What I want to do is to print myvariable and show this:

Genève

I tried this but failed:

print myvariable.decode('utf-8')

What's the right way to do it? At the end I'd like to print the string into a text file. I'm using Python 2.7.

Update: Also tried this:

In [23]: myvariable = u'Gen\xe8ve'

In [24]: print myvariable
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-24-1eb59a50889d> in <module>()
----> 1 print myvariable

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 3: ordinal not in range(128)

Update 2: I really want to print from myvariable. In actual code Gen\xe8ve are extracted from xml.etree.ElemTree parser, like:

myvariable = actress.find('name').text
## The following doesn't work. 
# print u'myvariable'
pdubois
  • 7,640
  • 21
  • 70
  • 99
  • You need to know the encoding of your XML. If it's UTF8 and it contains an \xe8 byte, then it's simply a bad file. – RemcoGerlich May 23 '14 at 08:34
  • @RemcoGerlich: `` You are right. Is there a way I can use ElemTree to parse it under UTF-8 format? – pdubois May 23 '14 at 08:39

3 Answers3

1

That's not Unicode text, that's a bytestring. This is Unicode text:

myvariable = u'Gen\xe8ve'
print myvariable
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
1

When you print a unicode string directly

myvariable = u'Gen\xe8ve'
print myvariable

python tries to encode it with the default encoding (sys.stdout.encoding). Since it appears to be ascii on your system, it tries ascii and fails (there's no such thing as \xe8 in ascii). Try specifying the encoding explicitly:

myvariable = u'Gen\xe8ve'
print myvariable.encode('utf-8')
gog
  • 10,367
  • 2
  • 24
  • 38
  • Thanks but see my Update2 and the [real issue under XML link](http://stackoverflow.com/questions/23825149/parsing-xml-file-with-utf-8-encoding-and-bytestring-using-elemtree) – pdubois May 23 '14 at 09:15
  • @pdubois: the answer stands: `encode` everything you're printing. – gog May 23 '14 at 09:19
  • thanks a million. Is there a way to define that encoding globally without having to give `.encode('utf-8')` for every variable? – pdubois May 23 '14 at 09:21
  • 1
    @pdubois: some ways are outlined here: http://stackoverflow.com/questions/2276200/changing-default-encoding-of-python – gog May 23 '14 at 09:28
0

'\xe8' isn't UTF8, it's some other encoding.

Try:

>>> x = 'Gen\xc3\xa8ve'
>>> print x.decode('utf8')

Or find out what the encoding actually is, and decode that.

RemcoGerlich
  • 30,470
  • 6
  • 61
  • 79