5

I am trying to convert alphanumeric string with maximum length of 40 characters to an integer as small as possible so that we can easily save and retrieve from database. I am not aware if there is any python method existing for it or any simple algorithms we can use. To be specific my string will have only characters 0-9 and a-g. So kindly help with any suggestions on how we can uniquely convert from string to int and vice versa. I am using Python 2.7 on Cent os 6.5

RAFIQ
  • 905
  • 3
  • 18
  • 32
  • How can you convert characters into an int? What would "a0b3" be as a integer? Are you expected to make some conversion, e.g. by using the ASCII values for characters a-g? – Tim Biegeleisen Mar 13 '15 at 07:15
  • Would you like to pick the numbers out of that string or "serialize" the whole strings as an integer? The latter surely won't work - certainly not with 40 byte long strings... Plus, strings are probably the second-most common datatype written to databases - why bother with converting it? – sebastian Mar 13 '15 at 07:15
  • @TimBiegeleisen Not necessarily, but anything is fine as long as we can easily revert the conversion, hope this is clear, need unique mapping – RAFIQ Mar 13 '15 at 07:18
  • You might wish to look at `struct.pack` – cdarke Mar 13 '15 at 07:38

3 Answers3

7

This is not that difficult:

def str2int(s, chars):
    i = 0
    for c in reversed(s):
        i *= len(chars)
        i += chars.index(c)
    return i

def int2str(i, chars):
    s = ""
    while i:
        s += chars[i % len(chars)]
        i //= len(chars)
    return s

Example:

>>> chars = "".join(str(n) for n in range(10)) + "abcdefg"
>>> str2int("0235abg02", chars)
14354195089
>>> int2str(_, chars)
'0235abg02'

Basically if you want to encode n characters into an integer you interpret it as base-n.

orlp
  • 112,504
  • 36
  • 218
  • 315
  • Great, size of integer seems to increase rapidly with string size, so any idea of what could be the largest integer size we can get and when does it happen? when we have most(may be all) alphabets or numbers or anything else. Do you really suggest such conversion over saving string itself? – RAFIQ Mar 13 '15 at 07:35
  • @RAFIQ An encoding of arbitrary strings of length _n_ with a charset of size _c_ will have a maximum size of c^n. This is an information theory lower bound and can not be improved upon. This happens in my above example when you encode a string of only the last character, for example "gggg". I wouldn't suggest this conversion unless you've determined that doing so would decrease hardware storage costs significantly for your project. – orlp Mar 13 '15 at 07:41
  • s += chars[i % len(chars)] -> s = chars[i % len(chars)] + s – nvd Jun 30 '19 at 15:27
5

There are 17 symbols in your input, so you can treat is as a base-17 number:

>>> int('aga0',17)
53924

For the reverse conversion, there are lots of solutions over here.

Community
  • 1
  • 1
georg
  • 211,518
  • 52
  • 313
  • 390
  • 3
    This breaks down if the characters needed to encode are not a direct extension of [0-9a-g], for example [0-9a-gz] and does not provide a way back. – orlp Mar 13 '15 at 07:29
2

Improving on the above answers:

# The location of a character in the string matters.
chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"

charsLen = len(chars)

def numberToStr(num):
  s = ""
  while num:
    s = self.chars[num % charsLen] + s
    num //= charsLen

  return s # Or e.g. "s.zfill(10)"

Can handle strings with leading 0s:

def strToNumber(numStr):
  num = 0
  for i, c in enumerate(reversed(numStr)):
    num += chars.index(c) * (charsLen ** i)

  return num
nvd
  • 2,995
  • 28
  • 16