-1

I have text like this:

‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.

I understand that #8216 is an ASCII character.How can i convert it to normal characters without using .replace which is cumbersome.

user2784753
  • 209
  • 1
  • 3
  • 7
  • 1
    ‘ is a numeric entity. –  Sep 28 '13 at 15:36
  • Is [this](http://stackoverflow.com/questions/730299/replace-html-entities-with-the-corresponding-utf-8-characters-in-python-2-6) relevant? – rlms Sep 28 '13 at 15:37
  • Nope that was asked in 08'. Please verify. – user2784753 Sep 28 '13 at 15:38
  • 2
    @user2784753: And what difference that *that* make? The answer still applies. – Martijn Pieters Sep 28 '13 at 15:44
  • That particular numeric entity is not an ASCII character. ASCII only covers codes up to 127; above that is ISO 8859-1 (aka Latin 1) up to 255 and then ISO 10646 (pretty much Unicode). – Yann Vernier Sep 28 '13 at 16:19
  • @YannVernier: ASCII and Latin-1 are part of Unicode *too*. But that character is not ASCII, indeed. – Martijn Pieters Sep 28 '13 at 16:23
  • Yes, but only the early parts. I apologize if I made that unclear. What I wanted to point out is that ASCII isn't sufficient for this character (opening single quote, the with ’ being closing single quote); this may make the question of converting it to "normal" characters rather complicated. Unicode is sufficient, so Python produces a unicode object (prefixed with u) in your answer. – Yann Vernier Sep 28 '13 at 16:32

1 Answers1

3

You have an HTML escape there. Use the HTMLParser.HTMLParser() class to unescape these:

from HTMLParser import HTMLParser

parser = HTMLParser()
unescaped = parser.unescape(escaped)

Demo:

>>> from HTMLParser import HTMLParser
>>> parser = HTMLParser()
>>> escaped = '‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.'
>>> parser.unescape(escaped)
u'\u2018The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,\u2019wroteforumuser Ensorceled.'
>>> print parser.unescape(escaped)
‘The zoom animations everywhere on the new iOS 7 are literally making me nauseous and giving me a headache,’wroteforumuser Ensorceled.

In Python 3, the HTMLParser module has been renamed to html.parser; adjust the import accordingly:

from html.parser import HTMLParser
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343