I think the other answers are better than this one, but purely mathematically, there is an obvious way of doing this. You just have to interpret a message as an integer written in another base system with different symbols
def frombase(s, sym):
b = len(sym)
n = 0
bl = 1
for a in reversed(s):
n += sym.index(a) * bl
bl *= b
return n
def tobase(n, sym):
b = len(sym)
s = ''
while n > 0:
kl = n % b
n //= b
s += sym[kl]
return s[::-1] if s else sym[0]
and then for your specific case
symbols = [
' ', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D',
'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z', ',', '.', '?', '!', '-', ':', ';',
'_', '"', "'", '#', '$', '%', '&', '/', '(', ')',
'=', '+', '*', '<', '>', '~'
]
encodeword = lambda w: frombase(w, symbols)
decodeword = lambda n: tobase(n, symbols)
Though the first symbol (" ") will be omitted if there's nothing in front of it, similarly to 0001 = 1.
If you really want to represent all possible symbols, you can write them as a sequence of their ord
values (integers), seperated by the ,
symbol. Then you encode that in the base with an added ,
symbol:
symbols = [',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] # , is zero
txt2int = lambda w: encodeword(','.join(str(ord(x)) for x in w))
int2txt = lambda n: ''.join(chr(int(x)) for x in decodeword(n).split(','))
Regarding the size of the returned integer: txt2int(w) = encodeword(w) = O(len(w))
, meaning e.g. 'Hi there!' would encode to a number with about 9 digits.