I need to build a python encoder so that I can reformat strings like this:
import codecs
codecs.encode("Random UTF-8 String ☑⚠⚡", 'name_of_my_encoder')
The reason this is even something I'm asking stack overflow is, the encoded strings need to pass this validation function. This is a hard constraint, there is no flexibility on this, its due to how the strings have to be stored.
from string import ascii_letters
from string import digits
valid_characters = set(ascii_letters + digits + ['_'])
def validation_function(characters):
for char in characters:
if char not in valid_characters:
raise Exception
Making an encoder seemed easy enough, but I'm not sure if this encoder is making it harder to build a decoder. Heres the encoder I've written.
from codecs import encode
from string import ascii_letters
from string import digits
ALPHANUMERIC_SET = set(ascii_letters + digits)
def underscore_encode(chars_in):
chars_out = list()
for char in chars_in:
if char not in ALPHANUMERIC_SET:
chars_out.append('_{}_'.format(encode(char.encode(), 'hex').decode('ascii')))
else:
chars_out.append(char)
return ''.join(chars_out)
This is the encoder I've written. I've only included it for example purposes, theres probably a better way to do this.
Edit 1 - Someone has wisely pointed out just using base32 on the entire string, which I can definitely use. However, it would be preferable to have something that is 'somewhat readable', so an escaping system like https://en.wikipedia.org/wiki/Quoted-printable or https://en.wikipedia.org/wiki/Percent-encoding would be preferred.
Edit 2 - Proposed solutions must work on Python 3.4 or newer, working in Python 2.7 as well is nice, but not required. I've added the python-3.x tag to help clarify this a little.