4

Let's say you have a string:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"

I am looking for a way to convert that string into a number, like say:

encoded_string = number_encode(mystring)

print(encoded_string)

08713091353153848093820430298

..that you can convert back to the original string.

decoded_string = number_decode(encoded_string)

print(decoded_string)

"Welcome to the InterStar cafe, serving you since 2412!"

It doesn't have to be cryptographically secure, but it does have to put out the same number for the same string regardless of what computer it's running on.

lespaul
  • 477
  • 2
  • 8
  • 21
  • 1
    What do you want to do? To transmit the string over the network? If so, why don't use base 64 encoding, it's simple with `base64`. – dcg Mar 28 '19 at 22:24
  • 3
    Seems like you'd want to [encode the string as bytes](https://stackoverflow.com/questions/7585435/best-way-to-convert-string-to-bytes-in-python-3) and then [decode those as an int](https://stackoverflow.com/questions/25259947/convert-variable-sized-byte-array-to-a-integer-long), then reverse the process to get the string back out. But it doesn't really make sense to do that IMO - are you sure this isn't an [XY problem](http://xyproblem.info/)? – Random Davis Mar 28 '19 at 22:24
  • Knowing why you want to do this would really help in coming up with the best way. @RandomDavis’s suggestion works for pretty much all strings (except for ones that start/end with `'\0'` characters depending on which you pick), but you can also get smaller numbers if your character set is more restricted. Or maybe you want fixed-size numbers. (Or to compress the string first?) – Ry- Mar 28 '19 at 22:27
  • Depending on the output you want (one to one character mapping? Longer digit string than character string?) You can encode with `ord()` and decode with `chr()`, or you can replace according to a custom dict, or any number of options – G. Anderson Mar 28 '19 at 22:28

3 Answers3

10

encode it to a bytes in a fixed encoding, then convert the bytes to an int with int.from_bytes. The reverse operation is to call .to_bytes on the resulting int, then decode back to str:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8')
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes.decode('utf-8')
print(recoveredstring)

Try it online!

This has one flaw, which is that if the string ends in NUL characters ('\0'/\x00') you'll lose them (switching to 'big' byte order would lose them from the front). If that's a problem, you can always just pad with a '\x01' explicitly and remove it on the decode side so there are no trailing 0s to lose:

mystring = "Welcome to the InterStar cafe, serving you since 2412!"
mybytes = mystring.encode('utf-8') + b'\x01'  # Pad with 1 to preserve trailing zeroes
myint = int.from_bytes(mybytes, 'little')
print(myint)
recoveredbytes = myint.to_bytes((myint.bit_length() + 7) // 8, 'little')
recoveredstring = recoveredbytes[:-1].decode('utf-8') # Strip pad before decoding
print(recoveredstring)
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
1

If you are simply looking for making a certain string unreadable by a human you might use base64, base64.b64encode(s, altchars=None) and base64.b64decode(s, altchars=None, validate=False):

Take into account that it requires bytes-like object, so you should start your strings with b"I am a bytes-like string":

>>> import base64
>>> coded = base64.b64encode(b"Welcome to the InterStar cafe, serving you since 2412!")
>>> print(coded)
b'V2VsY29tZSB0byB0aGUgSW50ZXJTdGFyIGNhZmUsIHNlcnZpbmcgeW91IHNpbmNlIDI0MTIh'
>>> print(base64.b64decode(coded))
b"Welcome to the InterStar cafe, serving you since 2412!"

If you already have your strings, you can convert them with str.encode('utf-8'):

>>> myString = "Welcome to the InterStar cafe, serving you since 2412!"
>>> bString = myString.encode('utf-8')
>>> print(bString)
b'Welcome to the InterStar cafe, serving you since 2412!'
>>> print(bString.decode())
'Welcome to the InterStar cafe, serving you since 2412!'

If you really need to convert the string to only numbers, you would have to use @ShadowRanger's answer.

Ender Look
  • 2,303
  • 2
  • 17
  • 41
  • You're assuming "encoding" is the only goal. They may have some API that requires a true integer (some toy RSA encrypter or the like), and this won't help with that. – ShadowRanger Mar 28 '19 at 22:33
1

I think the other answers are better than this one, but purely mathematically, there is an obvious way of doing this. You just have to interpret a message as an integer written in another base system with different symbols

def frombase(s, sym):
    b = len(sym)
    n = 0
    bl = 1
    for a in reversed(s):
        n += sym.index(a) * bl
        bl *= b
    return n

def tobase(n, sym):
    b = len(sym)
    s = ''
    while n > 0:
        kl = n % b
        n //= b
        s += sym[kl]
    return s[::-1] if s else sym[0]

and then for your specific case

symbols = [
    ' ', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
    'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
    'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D',
    'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
    'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
    'Y', 'Z', ',', '.', '?', '!', '-', ':', ';',
    '_', '"', "'", '#', '$', '%', '&', '/', '(', ')',
    '=', '+', '*', '<', '>', '~'
]
encodeword = lambda w: frombase(w, symbols)
decodeword = lambda n: tobase(n, symbols)

Though the first symbol (" ") will be omitted if there's nothing in front of it, similarly to 0001 = 1.

If you really want to represent all possible symbols, you can write them as a sequence of their ord values (integers), seperated by the , symbol. Then you encode that in the base with an added , symbol:

symbols = [',', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] # , is zero
txt2int = lambda w: encodeword(','.join(str(ord(x)) for x in w))
int2txt = lambda n: ''.join(chr(int(x)) for x in decodeword(n).split(','))

Regarding the size of the returned integer: txt2int(w) = encodeword(w) = O(len(w)), meaning e.g. 'Hi there!' would encode to a number with about 9 digits.

kuco 23
  • 786
  • 5
  • 18