3

Inspired by the "encoding scheme" of the answer to this question, I implemented my own encoding algorithm in Python.

Here is what it looks like:

import random
from math import pow
from string import ascii_letters, digits

# RFC 2396 unreserved URI characters
unreserved = '-_.!~*\'()'
characters = ascii_letters + digits + unreserved
size = len(characters)
seq = range(0,size)

# Seed random generator with same randomly generated number
random.seed(914576904)
random.shuffle(seq)

dictionary = dict(zip(seq, characters))
reverse_dictionary = dict((v,k) for k,v in dictionary.iteritems())

def encode(n):
    d = []
    n = n
    while n > 0:
        qr = divmod(n, size)
        n = qr[0]
        d.append(qr[1])
    chars = ''
    for i in d:
        chars += dictionary[i]
    return chars

def decode(str):
    d = []
    for c in str:
        d.append(reverse_dictionary[c])
    value = 0
    for i in range(0, len(d)):
        value += d[i] * pow(size, i)
    return value

The issue I'm running into is encoding and decoding very large integers. For example, this is how a large number is currently encoded and decoded:

s = encode(88291326719355847026813766449910520462)
# print s -> "3_r(AUqqMvPRkf~JXaWj8"
i = decode(s)
# print i -> "8.82913267194e+37"
# print long(i) -> "88291326719355843047833376688611262464"

The highest 16 places match up perfectly, but after those the number deviates from its original.

I assume this is a problem with the precision of extremely large integers when dividing in Python. Is there any way to circumvent this problem? Or is there another issue that I'm not aware of?

Community
  • 1
  • 1
mburke13
  • 350
  • 2
  • 6
  • 22
  • 2
    Show `dictionary`, `reverse_dictionary` and `size`, please. – Karl Knechtel Mar 17 '12 at 17:04
  • 1
    "Seed random generator with same randomly generated number". That's a good one :D Reminds me of http://xkcd.com/221/ – Niklas B. Mar 17 '12 at 17:16
  • Haha, I figured a comment would be better than just (seemingly) arbitrarily seeding the random generator. Thanks for the fast and correct answer! – mburke13 Mar 17 '12 at 17:52

1 Answers1

4

The problem lies within this line:

value += d[i] * pow(size, i)

It seems like you're using math.pow here instead of the built-in pow method. It returns a floating point number, so you lose accuracy for your large numbers. You should use the built-in pow or the ** operator or, even better, keep the current power of the base in an integer variable:

def decode(s):
    d = [reverse_dictionary[c] for c in s]
    result, power = 0, 1
    for x in d:
        result += x * power
        power *= size
    return result

It gives me the following result now:

print decode(encode(88291326719355847026813766449910520462))
# => 88291326719355847026813766449910520462
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • 1
    I'm not sure I agree. For positive integer inputs, I think pow returns an integer or long output.. OH! The OP must be using math.pow, either because of a *-import or because of a direct import. – DSM Mar 17 '12 at 17:16
  • pow != math.pow, though. The OP must have clobbered the default three-argument pow. – DSM Mar 17 '12 at 17:19
  • @DSM: Seems like it. Otherwise I can't reproduce the problem. – Niklas B. Mar 17 '12 at 17:20
  • I have only recently started working with Python so I was unaware of the built-in pow function and assumed math.pow was the default. Will keep this in mind, guess I learn more than one thing from this question! – mburke13 Mar 17 '12 at 20:15
  • @Matt: So you did a `from math import pow`? You should at least add such highly relevant code to the question the next time :) – Niklas B. Mar 17 '12 at 20:17