How to escape HTML with characters like –
in Python?
Asked
Active
Viewed 1,141 times
1

Alex
- 43,191
- 44
- 96
- 127
-
See this previous Stack Overflow question: http://stackoverflow.com/questions/913933/decoding-html-encoded-strings-in-python – las3rjock Aug 08 '09 at 15:48
-
1Not to be confused with "Escaping FROM Pythons!" – NoMoreZealots Aug 08 '09 at 15:49
-
@Alex, please clarify: do you have a Python Unicode string and want to produce the escaped HTML, or viceversa, do you have the HTML containing escapes and want to produce a Python Unicode string? – Alex Martelli Aug 08 '09 at 16:07
-
@Alex, I'd like to produce a Python Unicode string from HTML with escapes ASCII string, well, and the other way round as well. – Alex Aug 09 '09 at 11:59
2 Answers
2
If you have a unicode string as input, you can use the xmlcharrefreplace error handler:
py> u"<p>\N{EN DASH}</p>".encode("ascii", "xmlcharrefreplace")
'<p>–</p>'

Martin v. Löwis
- 124,830
- 17
- 198
- 235