0

I have a string like below:

THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist.

as can be seen, the apostrophes ( ' ) are being represented as ASCII code:

&#39

how would you suggest I encode this string?

Other ascii codes are appearing as well:

"
&
Tim Bueno
  • 401
  • 2
  • 5
  • 11
  • Why do you want to encode the string? What are you trying to do with it? – Floris Aug 13 '13 at 23:01
  • 2
    Those are HTML character references. ASCII has nothing to do with your problem. – user2357112 Aug 13 '13 at 23:10
  • I am displaying the string on a website made with Flask. So in a browser. http://www.longboxed.com/issue/JUN130454D – Tim Bueno Aug 13 '13 at 23:15
  • The first possible dup is 3.x-specific; the second is 2.x-specific. The accepted answers are the same, except for the fact that 2.x's `HTMLParser` was renamed `html.parser` in 3.x. – abarnert Aug 13 '13 at 23:18

1 Answers1

0

Those are called HTML entities. The easiest way is to unescape them is to use HtmlParser from standard library:

>>> s = "THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist."
>>> import HTMLParser
>>> HTMLParser.HTMLParser().unescape(s)
u"THE SMASH-HIT, CRITICALLY ACCLAIMED SERIES RETURNS! Now that you've read the first two bestselling collections of SAGA , you're all caught up and ready to jump on the ongoing train with Chapter Thirteen, beginning an all-new monthly sci-fi/fantasy adventure, as Hazel and her parents head to the planet Quietus in search of cult romance novelist D. Oswald Heist."

Also see:

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Instead of linking to a duplicate question and summarizing its answer, just close this question as a dup. – abarnert Aug 13 '13 at 23:18
  • This is exactly what I needed. I see now that I didnt really understand my problem. Yes it was a dupe, but I think he deserves the credit anyway. – Tim Bueno Aug 13 '13 at 23:23
  • Agreed, just wanted to clarify that those things are called html entities and give a live example. – alecxe Aug 13 '13 at 23:23