Python encoding issue in reading from text file

Question

I am reading a text file containing a single word B\xc3\xa9zier.

I wish to convert this to its equivalent decoded utf-8 form i.e. Bézier and print it to console.

My code is as follows:

foo=open("test.txt")  
for line in foo.readlines():  
    for word in line.split():  
        print(word.decode('utf-8'))
foo.close()

the output is:

B\xc3\xa9zier

However if i do something like this:

>>> print('B\xc3\xa9zier'.decode('utf-8'))

I get the correct output:

Bézier

I am unable to figure out why this is happening?

possible duplicate of [Unicode (utf8) reading and writing to files in python](http://stackoverflow.com/questions/491921/unicode-utf8-reading-and-writing-to-files-in-python) — jamylak, Jun 04 '13 at 11:22

score 6 · Accepted Answer · edited Jun 04 '13 at 11:37

6

It seems as though you have a raw utf8 escaped string in the file, use string_escape to decode it instead

with open('test.txt') as f:
    for line in f:
        for word in line.split():
            print(word.decode('string_escape').decode('utf-8'))


Bézier

edited Jun 04 '13 at 11:37

answered Jun 04 '13 at 11:11

jamylak

128,818
30
231
230

Thanks for response .Could You please help me to figure out what is wrong with my code. – Jun 04 '13 at 11:13
1

@Chauhan you should use `string_escape` encoding instead of `utf-8` because you want to unescape a unicode string, not decode utf8. – kirelagin Jun 04 '13 at 11:13
1

@Chauhan Then you need `word.decode('string_escape').decode('utf-8')` – Janne Karila Jun 04 '13 at 11:20
@JanneKarila Thanks a lot .It works.Could you please explain it . – Jun 04 '13 at 11:24
@Chauhan `decode('string_escape')` decodes the backslash escapes and gives you a utf-8 encoded string. jamylak is able to print it directly because his or her terminal uses utf-8 encoding. – Janne Karila Jun 04 '13 at 11:29
You shouldn't `decode('utf8')` it right avay. Do this just before printing. – kirelagin Jun 04 '13 at 11:31
@kirelagin I would recommend decoding and working with unicode as long as possible. – Janne Karila Jun 04 '13 at 11:32
@JanneKarila I'm sorry, you are right. I always get lost in those Python2 Unicode-related things =(. – kirelagin Jun 04 '13 at 11:34

Python encoding issue in reading from text file

1 Answers1