1

I am reading a text file containing a single word B\xc3\xa9zier.

I wish to convert this to its equivalent decoded utf-8 form i.e. Bézier and print it to console.

My code is as follows:

foo=open("test.txt")  
for line in foo.readlines():  
    for word in line.split():  
        print(word.decode('utf-8'))
foo.close()

the output is:

B\xc3\xa9zier

However if i do something like this:

>>> print('B\xc3\xa9zier'.decode('utf-8'))

I get the correct output:

Bézier

I am unable to figure out why this is happening?

kirelagin
  • 13,248
  • 2
  • 42
  • 57
  • possible duplicate of [Unicode (utf8) reading and writing to files in python](http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python) – jamylak Jun 04 '13 at 11:22

1 Answers1

6

It seems as though you have a raw utf8 escaped string in the file, use string_escape to decode it instead

with open('test.txt') as f:
    for line in f:
        for word in line.split():
            print(word.decode('string_escape').decode('utf-8'))


Bézier
jamylak
  • 128,818
  • 30
  • 231
  • 230