Replacing HTML representation to ascii using Python

Question

Possible Duplicate:
Decode HTML entities in Python string?

I have parsed some HTML text. But some punctuations like apostrophe are replaced by ’. How to revert them back to `

P.S: I am using Python/Feedparser

Thanks

You should take a look there http://stackoverflow.com/questions/2360598/how-do-i-unescape-html-entities-in-a-string-in-python-3-1 — Gilles Quénot, Nov 08 '11 at 21:51
`feedparser` parses `’` perfectly fine for me. What are you parsing, and how are you using `feedparser` to parse it? — ekhumoro, Nov 09 '11 at 00:17

Daniel Nouri · Answer 1 · 2011-11-08T22:04:04.050

1

The PSF Wiki has some ways of doing it. Here is one way:

import htmllib

def unescape(s):
    p = htmllib.HTMLParser(None)
    p.save_bgn()
    p.feed(s)
    return p.save_end()

edited Nov 08 '11 at 22:04

answered Nov 08 '11 at 21:54

Daniel Nouri

score 0 · Answer 2 · answered Nov 10 '11 at 21:03

0

This helped me

import HTMLParser

hparser=HTMLParser.HTMLParser()
new_text=hparser.unescape(raw_text)

answered Nov 10 '11 at 21:03

bdhar

2 Answers2