11

Quick question. I'm trying to find or write an encoder in Python to shorten a string of numbers by using upper and lower case letters. The numeric strings look something like this:

20120425161608678259146181504021022591461815040210220120425161608667

The length is always the same.

My initial thought was to write some simple encoder to utilize upper and lower case letters and numbers to shorten this string into something that looks more like this:

a26Dkd38JK

That was completely arbitrary, just trying to be as clear as possible. I'm certain that there is a really slick way to do this, probably already built in. Maybe this is an embarrassing question to even be asking.

Also, I need to be able to take the shortened string and convert it back to the longer numeric value. Should I write something and post the code, or is this a one line built in function of Python that I should already know about?

Thanks!

user000001
  • 32,226
  • 12
  • 81
  • 108
Ryan Martin
  • 1,613
  • 3
  • 24
  • 36

4 Answers4

10

This is a pretty good compression:

import base64

def num_to_alpha(num):
    num = hex(num)[2:].rstrip("L")

    if len(num) % 2:
        num = "0" + num

    return base64.b64encode(num.decode('hex'))

It first turns the integer into a bytestring and then base64 encodes it. Here's the decoder:

def alpha_to_num(alpha):
    num_bytes = base64.b64decode(alpha)
    return int(num_bytes.encode('hex'), 16)

Example:

>>> num_to_alpha(20120425161608678259146181504021022591461815040210220120425161608667)
'vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w=='
>>> alpha_to_num('vw4LUVm4Ea3fMnoTkHzNOlP6Z7eUAkHNdZjN2w==')
20120425161608678259146181504021022591461815040210220120425161608667
orlp
  • 112,504
  • 36
  • 218
  • 315
  • This looks like it works great. Exactly what I was looking for, thanks. RE: int vs. string: Passing a string to this function actually doesn't work. It does need to be an int. Good work! – Ryan Martin Apr 26 '12 at 01:46
  • This requires arbitrary-precision integers, which fortunately python has. – ninjagecko Apr 26 '12 at 02:22
  • @ninjagecko: I don't see why this should require arbitrary-precision integers. It works with them - sure - but there is no part of the given functions that relies on arbitrary precision integers. – orlp Apr 26 '12 at 03:20
  • @nightcracker: Yes there is. The fact that you take an integer as input as reason enough to require arbitrary-precision integers. You can test this yourself by trying to do this in another language such as javascript without arbitrary-precision integers. Sufficiently large inputs will be meaningless. This is not to say this makes the answer unreasonable in other languages; the answer will work for any input which does not overflow. Not usually a concern, but the OP was using a string of digits which would overflow in most non-python languages. – ninjagecko Apr 26 '12 at 11:11
  • @ninjagecko: ah now I see what you mean. – orlp Apr 26 '12 at 23:46
10

There are two functions that are custom (not based on base64), but produce shorter output:

chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
l = len(chrs)

def int_to_cust(i):
    result = ''
    while i:
        result = chrs[i % l] + result
        i = i // l
    if not result:
        result = chrs[0]
    return result

def cust_to_int(s):
    result = 0
    for char in s:
        result = result * l + chrs.find(char)
    return result

And the results are:

>>> int_to_cust(20120425161608678259146181504021022591461815040210220120425161608667)
'9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx'
>>> cust_to_int('9F9mFGkji7k6QFRACqLwuonnoj9SqPrs3G3fRx')
20120425161608678259146181504021022591461815040210220120425161608667L

You can also shorten the generated string, if you add other characters to the chrs variable.

Tadeck
  • 132,510
  • 28
  • 152
  • 198
  • I guess you're doing the base 64 encoding yourself instead of using the lib. – Paul Hoang Apr 26 '12 at 02:52
  • 4
    @PaulHoang: I guess your guess is wrong. I presented the function that works in a similar manner, but 1) without the need for padding (try to remove `=`s from nightcracker's answer), 2) with ability to define your own characters used for representing the converted value. The way it works is very similar to base64, but it is not base64. There is probably a library to do something like that, but did not find which one is it. – Tadeck Apr 26 '12 at 12:38
  • I really like this solution. In playing around with the various suggested solutions, I like this the best because I can restrict it to just letters and numbers and have control over adding more characters in the future. Nicely done. – Ryan Martin Apr 26 '12 at 17:11
2

Do it with 'class':

VALID_CHRS = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
BASE = len(VALID_CHRS)
MAP_CHRS = {k: v
            for k, v in zip(VALID_CHRS, range(BASE + 1))}


class TinyNum:
    """Compact number representation in alphanumeric characters."""

    def __init__(self, n):
        result = ''
        while n:
            result = VALID_CHRS[n % BASE] + result
            n //= BASE
        if not result:
            result = VALID_CHRS[0]
        self.num = result

    def to_int(self):
        """Return the number as an int."""
        result = 0
        for char in self.num:
            result = result * BASE + MAP_CHRS[char]
        return result

Sample usage:

>> n = 4590823745
>> tn = TinyNum(a)
>> print(n)
4590823745
>> print(tn.num)
50GCYh
print(tn.to_int())
4590823745

(Based on Tadeck's answer.)

ChaimG
  • 7,024
  • 4
  • 38
  • 46
0
>>> s="20120425161608678259146181504021022591461815040210220120425161608667"
>>> import base64, zlib
>>> base64.b64encode(zlib.compress(s))
'eJxly8ENACAMA7GVclGblv0X4434WrKFVW5CtJl1HyosrZKRf3hL5gLVZA2b'
>>> zlib.decompress(base64.b64decode(_))
'20120425161608678259146181504021022591461815040210220120425161608667'

so zlib isn't real smart at compressing strings of digits :(

John La Rooy
  • 295,403
  • 53
  • 369
  • 502