Python string to unicode fails

Asked Feb 17 '18 at 16:08

Active Feb 17 '18 at 16:08

Viewed 35 times

I have a file with the following line:

raúl fernando sendic rodríguez is the leader of uruguay .

I am reading the file doing:

l = open(file_name).read().split()

And I get:

['ra\xc3\xbal', 'fernando', 'sendic', 'rodr\xc3\xadguez', 'is', 'the', 'leader', 'of', 'uruguay', '.']

I want to convert it to a unicode string with the u in the begining, but so far I tried:

print list(map(lambda a: unicode(a, "utf-8"), l))
print list(map(lambda a: a.decode('utf-8'), l))

And got (same for both):

[u'ra\xfal', u'fernando', u'sendic', u'rodr\xedguez', u'is', u'the', u'leader', u'of', u'uruguay', u'.']

How can I properly decode that string as unicode?

Note: This is python 2.7

asked Feb 17 '18 at 16:08

Amit

2

There's nothing wrong with those strings. They just aren't printed the way you expected. Try `print(u'ra\xfal')` and you'll get `raúl` as output. – Aran-Fey Feb 17 '18 at 16:23
You are right, that is actually more interesting... why does printing an array of strings different than just the strings? – Amit Feb 17 '18 at 16:28
1

The `print` function calls `str` on the value. A list's `str` function calls `repr` on all its elements. So `print([u'ra\xfal'])` prints the string in the same way `print(repr(u'ra\xfal'))` would. – Aran-Fey Feb 17 '18 at 16:30

0 Answers0