3

Possible Duplicate:
Replace html entities with the corresponding utf-8 characters in Python 2.6
What's the easiest way to escape HTML in Python?

There is a way to easily convert a string to a HTML string, e.g. with chars like <, > replaced by &lt; &gt; or will I have to write my own conversion routine???

Community
  • 1
  • 1
alessandro
  • 3,838
  • 8
  • 40
  • 59
  • 1
    see http://docs.python.org/library/htmllib.html#module-htmlentitydefs – Ashwini Chaudhary Jun 12 '12 at 09:22
  • @TimPietzcker: oops... title doesn't really help ;-) – vartec Jun 12 '12 at 09:27
  • 1
    I think what you need is called "HTML escaping". This is why you didn't find the answer by yourself. [Here is a Stackoverflow answer.](http://stackoverflow.com/questions/1061697/whats-the-easiest-way-to-escape-html-in-python) – anonymous Jun 12 '12 at 09:24

1 Answers1

12

If you're only concerned about critical special characters like &, < and >:

>>> import cgi
>>> cgi.escape("<hello&goodbye>")
'&lt;hello&amp;goodbye&gt;'

For other non-ASCII characters:

>>> "Übeltäter".encode("ascii", "xmlcharrefreplace")
b'&#220;belt&#228;ter'

Of course, if necessary, you can combine the two:

>>> cgi.escape("<Übeltäter>").encode("ascii", "xmlcharrefreplace")
b'&lt;&#220;belt&#228;ter&gt;'
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • 1
    `>>> "Übeltäter".encode("ascii", "xmlcharrefreplace")` results in `UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)` – brandones Jun 01 '17 at 19:49
  • 3
    `cgi.escape()` is now deprecated. Use `html.escape()` instead - check [this answer](https://stackoverflow.com/a/5072031/738017) – Vito Gentile Sep 27 '21 at 14:38