Possible Duplicate:
Decode HTML entities in Python string?
I have parsed some HTML text. But some punctuations like apostrophe are replaced by ’
. How to revert them back to `
P.S: I am using Python/Feedparser
Thanks
Possible Duplicate:
Decode HTML entities in Python string?
I have parsed some HTML text. But some punctuations like apostrophe are replaced by ’
. How to revert them back to `
P.S: I am using Python/Feedparser
Thanks
The PSF Wiki has some ways of doing it. Here is one way:
import htmllib
def unescape(s):
p = htmllib.HTMLParser(None)
p.save_bgn()
p.feed(s)
return p.save_end()
This helped me
import HTMLParser
hparser=HTMLParser.HTMLParser()
new_text=hparser.unescape(raw_text)