2

I have seen this question I have doubts about how can I convert a var to unicode on running time ? Is it right use unicode function ? Are there other way to convert a string on running time ?

print(u'Cami\u00f3n') # prints with right special char

name=unicode('Cami\u00f3n')
print(name) # prints bad ===> Cami\u00f3n

name.encode('latin1')
print(name.decode('latin1')) # prints bad ===> Cami\u00f3n

encoded_id = u'abcd\xc3\x9f'
encoded_id.encode('latin1').decode('utf8')
print encoded_id.encode('latin1').decode('utf8') # prints right

I saw a lot of python unicode questions on stackoverflow but i can't understand this behaviour.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
Ulyarez
  • 155
  • 1
  • 2
  • 10
  • What are you trying to do? What data are you trying to convert? Where is it from? What does "on running time" mean? – Daniel Roseman Jun 16 '15 at 11:39
  • `\uhhhh` escape sequences only work in Python unicode literals. If you have data with such escape sequences, you may well have **JSON** data instead, which uses the same syntax. If so, use a JSON parser for that data. – Martijn Pieters Jun 16 '15 at 11:41
  • You can ask Python to interpret such sequences with a special codec, but that is *usually the wrong interpretation of your data*. Please share a sample of your actual data so we can help you with that. – Martijn Pieters Jun 16 '15 at 11:42

1 Answers1

6

Its just because of that if you don't specify any encoding for unicode function then :

unicode() will mimic the behaviour of str() except that it returns Unicode strings instead of 8-bit strings. More precisely, if object is a Unicode string or subclass it will return that Unicode string without any additional decoding applied.

So you'll have a str version of your unicode (the Unicode part will be escaped):

>>> name=unicode('Cami\u00f3n')
>>> print(name)
Cami\u00f3n
>>> name
u'Cami\\u00f3n'
       ^ 

For get ride of this problem you can use 'unicode-escape' as your encoding to escape converting the Unicode to string!

>>> name=unicode('Cami\u00f3n','unicode-escape')
>>> name
u'Cami\xf3n'
>>> print(name)
Camión
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • Works like a charm. I tried to use it to print data from a database with special characters for code tests. Thanks! – Ulyarez Jun 16 '15 at 11:56
  • 1
    Note that `unicode-escape` interprets *more* than just the `\uhhhh` escapes. If there are other `\\` backslash-escapes in the text, those too will be interpreted, and may not be what you expected. – Martijn Pieters Jun 16 '15 at 12:16
  • @Ulyarez Welcome, also note about Martijn's comment! – Mazdak Jun 16 '15 at 12:24