Replace all ascii symbols (other than alphabets) into HTML number in Python

Question

I need to replace all the ascii symbols other than alphabets into HTML number (http://www.ascii.cl/htmlcodes.htm). From this post(Convert HTML entities to Unicode and vice versa), I could use this code, but I still can't get * (or maybe many other characters) working.

What could be the solution? Just simple replacements could be the only solution?

>>> from BeautifulSoup import BeautifulStoneSoup as bs
>>> import cgi
>>> cgi.escape("<*>").encode('ascii', 'xmlcharrefreplace')

'&lt;*&gt;'

Why would `*` get replaced? It's not special in this context. There is no [html entity](http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references) for `*`. — Carsten, Apr 18 '15 at 22:47

score 1 · Accepted Answer · answered Apr 18 '15 at 23:14

Your question is a bit vague. I will assume that by "alphabets" you mean all characters from a-z and their uppercase variants. Then you can achieve the desired result using a regular expression:

>>> f = lambda s: re.sub(r'([^a-zA-Z])', lambda x: '&#{};'.format(ord(x.group(0))), s)
>>> f("<hi>")
'&#60;hi&#62;'
>>> f("<*>")
'&#60;&#42;&#62;'

Please note that, without knowing about your special application, this looks like a weird thing to do. There might be a better approach to solve the real underlying problem.

Replace all ascii symbols (other than alphabets) into HTML number in Python

1 Answers1