4

I'm using Python3 and I wonder if there is a module or a default function for converting all characters of a text to html entities (even the letters and digits) because I don't want to make a translation map for this.


Solved: As @justhalf told me, I found the solution by making this function:

def htmlEntities( string ):
    return ''.join(['&#{0};'.format(ord(char)) for char in string])
Cœur
  • 37,241
  • 25
  • 195
  • 267
Mircea
  • 153
  • 1
  • 8
  • Have you done your search in Google to find this? https://wiki.python.org/moin/EscapingHtml – justhalf Sep 04 '13 at 09:16
  • @justhalf: The solution on the Wiki page leaves ASCII codepoints alone; it only gives you entity escapes for non-ASCII characters. The OP (for some unfathomable reason) wants **all** codepoints escaped. – Martijn Pieters Sep 04 '13 at 09:19

1 Answers1

3

If you want to really escape all characters, there is no default function for that, but you can just replace each character with the ordinals manually:

''.join('&%d;'.format(ord(x)) for x in string)
justhalf
  • 8,960
  • 3
  • 47
  • 74
  • sorry, I didn't realize the escaped html gets converted by SO, haha – justhalf Sep 04 '13 at 09:21
  • 1
    btw. `map(lambda` is deprecated, you ought to use `['&%d;' % ord(x) for x in string]` – vartec Sep 04 '13 at 10:12
  • Doesn't work for me in python3.6. Use `html.escape()` – Claude Jul 10 '17 at 10:00
  • `HTMLParser.escape()` doesn't exist in any Python version; it has thrown `AttributeError: 'HTMLParser' object has no attribute 'escape'` from version 3.0 onwards. The `html.escape` function never escaped anything other than `&`, `<`, `>`, `"`, and `'`. Not sure where you got this answer from, but it could never have worked. – Martijn Pieters Oct 16 '19 at 21:36
  • Now, some *other library* may well have added that method to `html.parser.HTMLParser`, but that's then not part of the stdlib. – Martijn Pieters Oct 16 '19 at 21:37
  • @MartijnPieters Thank you for noticing that `html.parser.HTMLParser.escape` was not in the *documentation* in any Python version. I am sure that it did work before, and a quick search also reveals a lot of [other](https://stackoverflow.com/questions/2360598/how-do-i-unescape-html-entities-in-a-string-in-python-3-1) [examples](http://stackz.ru/en/275174/how-do-i-perform-html-decodingencoding-using-pythondjango) using this method. This is apparently a private function in Python 3.3 or before, as shown [in this issue](https://bugs.python.org/issue2927). I'll update my answer with this extra info. – justhalf Oct 17 '19 at 09:02
  • 2
    @justhalf: You are pointing to `html.parser.HTMLParser.unescape`, which indeed exists. I'm talking about the `escape()` method. `unescape()` moves from HTML entities to Unicode codepoints, and can be done much more easily with `html.unescape()` too. You claim that there was an `escape()` method that would move from Unicode codepoint to HTML entities; this doesn't exist. **It never existed**. There is the `html.escape()` function but that doesn't convert non-ascii codepoints to supported HTML5 entities. – Martijn Pieters Oct 17 '19 at 09:56
  • @MartijnPieters I just double-checked, and you're right. I'll remove that. It must have been my assumption that the complement function of `unescape` must exist. Sorry for my incorrect answer, and really appreciate your comment! – justhalf Oct 17 '19 at 10:15