0

I am new to Python but have been working with PHP for a while. I am looking for a method to convert all characters (except [0-9A-Za-z ]) to "HTML Decimal Entities". I have been searching around and haven't found a suitable method yet. I am looking for a carbon copy of this PHP method in Python.

The closest methods I have found are these in Python, but they don't exclude [0-9A-Za-z ]: Python3 Convert all characters to HTML Entities and How can I escape *all* characters into their corresponding html entity names and numbers in Python?

Just like the PHP method, I want a function that can convert every character (current and future) excluding [0-9A-Za-z ] to "HTML Decimal Entities" and where the UTF-8 character encoding is assumed.

E.g. "abcABC123 &%¤#" would become "abcABC123 &%¤#"

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • What exactly are "HTML Decimal Entities"? Did you just mean HTML Entities? – Wesley Smith Nov 03 '20 at 04:19
  • I mean the entity number. Take a look at this: https://www.freeformatter.com/html-entities.html e.g. an "&" becomes "&" and a "#" becomes "#". So basically a string like, "abc123 &%¤#" would become "abc123 &%¤#" – Allan Vester Nov 03 '20 at 04:34

1 Answers1

0

So I came up with this as an option on how to do it.

import re

def html_entity_encode_all(string):
    return ''.join(['&#{0};'.format(ord(char)) if re.search("[^0-9A-Za-z ]", char) else char for char in string])

print(html_entity_encode_all('abcABC123 &%¤#'))

Output: abcABC123 &%¤#

However, I don't know if there is a better way on how to do it or maybe a faster way to process it.