python decoding string issues

Question

I get the following string from database:

'23:45 \xe2\x80\x93 23:59'

and the output should look like

'23:45 - 23:59'

How can I decode this? I tried utf-8 decoding but no luck

>>> x.decode("utf-8")
u'23:45 \u2013 23:59'

Thank you

score 7 · Accepted Answer · answered Nov 03 '11 at 16:27

7

This is completely correct. The interactive python interpreter displaye the repr() of the string. If you want to see it as a proper string, print it:

>>> print '23:45 \xe2\x80\x93 23:59'
23:45 – 23:59

answered Nov 03 '11 at 16:27

ThiefMaster

310,957
84
592
636

Hi ThiefMaster, but how do I get '-' instead of \u2013? is the only option is to user re package? – daydreamer Nov 03 '11 at 16:30
The same way: with `print u'23:45 \u2013 23:59'`, you get as well the output `23:45 – 23:59`. – glglgl Nov 03 '11 at 16:32
I want to put this in the variable and when I do x = x.decode("utf-8"), I see in output 'quarter_hour': '23:45 \xe2\x80\x93 23:59' and not 'quarter_hour': '23:45 - 23:59' – daydreamer Nov 03 '11 at 16:40

score 1 · Answer 2 · answered Feb 04 '14 at 12:42

1

a="NOV–DEC 2011" (en-dash)
b=unidecode(a)

#output --> NOV-DEC 2011 (with hyphen)

You need to install unidecode first, and import it. I've tried it and it runs well. Hope it helps!

answered Feb 04 '14 at 12:42

rassel pratomo

73
1
7

Dave · Answer 3 · 2011-11-03T17:50:33.613

1

The UTF-8 representation of an "en dash" http://www.fileformat.info/info/unicode/char/2013/index.htm is hex 0xE2 0x80 0x93 (e28093), or u"\u2013". It sounds like you want to replace the en-dash character with an ascii hyphen/minus (0x2d) to store it in the variable. That's OK, but the variable won't contain the same character that is stored in the database, any more than if you replaced a Ü ( http://www.fileformat.info/info/unicode/char/dc/index.htm ) with an ascii U, or replaced a zero (0x30) with a capital O (0x4f).

edited Nov 03 '11 at 17:50

answered Nov 03 '11 at 17:43

Dave

3,834
2
29
44

See also http://stackoverflow.com/questions/816285/where-is-pythons-best-ascii-for-this-unicode-database, the last answer of which says: "Unidecode looks like a complete solution. It converts fancy quotes to ascii quotes, accented latin characters to unaccented and even attempts transliteration to deal with characters that don't have ASCII equivalents." – Dave Nov 03 '11 at 18:27

python decoding string issues

3 Answers3