2

Possible Duplicate:
Decode HTML entities in Python string?

I have parsed some HTML text. But some punctuations like apostrophe are replaced by ’. How to revert them back to `

P.S: I am using Python/Feedparser

Thanks

Community
  • 1
  • 1
bdhar
  • 21,619
  • 17
  • 70
  • 86
  • You should take a look there http://stackoverflow.com/questions/2360598/how-do-i-unescape-html-entities-in-a-string-in-python-3-1 – Gilles Quénot Nov 08 '11 at 21:51
  • `feedparser` parses `’` perfectly fine for me. What are you parsing, and how are you using `feedparser` to parse it? – ekhumoro Nov 09 '11 at 00:17

2 Answers2

1

The PSF Wiki has some ways of doing it. Here is one way:

import htmllib

def unescape(s):
    p = htmllib.HTMLParser(None)
    p.save_bgn()
    p.feed(s)
    return p.save_end()

See http://wiki.python.org/moin/EscapingHtml

Daniel Nouri
  • 1,264
  • 8
  • 9
0

This helped me

import HTMLParser

hparser=HTMLParser.HTMLParser()
new_text=hparser.unescape(raw_text)
bdhar
  • 21,619
  • 17
  • 70
  • 86