0

I found Marcel Jackwerth's response to How to code a URL shortener? to be a good answer for the problem, however my question is how it'll look in PHP? Here's Marcel's answer:


You need a Bijective Function f (there must be no x1 != x2, that will make f(x1) = f(x2); and for every y you will find a x so that f(x)=y). This is necessary so that you can find a inverse function g('abc') = 123 for your f(123)='abc' function.

I would continue your "convert number to string" approach (however you will realize that your proposed algorithm fails if your id is a prime and greater than 52).

How to convert the id to a shortened url:

  • Think of an alphabet you want to use. In your case that's [a-zA-Z0-9]. It contains 62 letters.
  • Take the auto-generated unique numerical key (auto-incremented id): for example 125 (a decimal number)
  • Now you have to convert the 125 (base 10) to X (base 62). This will then be {2}{1} (2×62+1=125).
  • Now map the symbols {2} and {1} to your alphabet. Say {0} = 'a', {25} = 'z' and so on. We will have {2} = 'c' and {1} = 'b'. So '/cb' will be your shortened url.

How to resolve a shortened url abc to the initial id:

  • If you want to do this in reverse, it's not quite diffcult. 'e9a' will be resolved to "4th,61st,0th letter in alphabet" = {4}{61}{0}, which is 4×62×62 + 61×62 + 0 = 19158. You will then just have to find your database-record with id 19158.
Community
  • 1
  • 1
ZombieDragon
  • 131
  • 1
  • 2
  • 12

4 Answers4

1
function convert($src, $srcAlphabet, $dstAlphabet) {
    $srcBase = strlen($srcAlphabet);
    $dstBase = strlen($dstAlphabet);

    $wet = $src;
    $val = 0;
    $mlt = 1;

    while ($l = strlen($wet)) {
        $digit = $wet[$l - 1];
        $val += $mlt * strpos($srcAlphabet, $digit);
        $wet = substr($wet, 0, $l - 1);
        $mlt *= $srcBase;
    }

    $wet = $val;
    $dst = '';

    while ($wet >= $dstBase) {
        $digitVal = $wet % $dstBase;
        $digit = $dstAlphabet[$digitVal];
        $dst = $digit . $dst;
        $wet /= $dstBase;
    }

    $digit = $dstAlphabet[$wet];
    $dst = $digit . $dst;

    return $dst;
}

// prints cb
print convert('125', '0123456789', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789');

// prints 19158
print convert('e9a', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', '0123456789');
rik
  • 8,592
  • 1
  • 26
  • 21
0

The main problem with Marcel's solution is that it uses a zero digit as a placeholder. By converting between bases, inevitably the numeral chosen to represent 0 can't appear at the front of the converted number.

For example, if you convert base 10 integers to base 4 using "ABCD" using the provided mechanism, there is no way to obtain output that starts with the letter "A", since that represents a zero in the new base and won't prefix the number. You might expect 5 to be "AA", but instead, it is "BA". There is no way to coerce that algorithm into producing "AA", because it would be like writing "00" in decimal, which has the same value as "0".

Here's an alternate solution in PHP that uses the entire gamut:

function encode($n, $alphabet = 'ABCD') {
    $output = '';

    if($n == 0) {
        $output = $alphabet[0];
    }
    else {
        $digits = floor(log($n, strlen($alphabet))) + 1;

        for($z = 0; $z < $digits; $z++) {
            $digit = $n % 4;
            $output = $alphabet[$digit] . $output;
            $n = floor($n / 4) - 1;
        }
    }
    return $output;
}

function decode($code, $alphabet = 'ABCD') {
    $n = 0;
    $code = str_split($code);
    $unit = 1;
    while($letter = array_pop($code)) {
        $n += (strpos($alphabet, $letter) + 1) * $unit;
        $unit = $unit * strlen($alphabet);
    }

    return $n - 1;
}

echo encode(25); // should output "ABB"
echo decode('ABB'); // should output 25

Change/pass the second parameter to a list of characters to use instead of the short 4-character dictionary of "ABCD".

ringmaster
  • 2,879
  • 3
  • 28
  • 31
0

I like this PHP function which allows you to customise the alphabet (and remove confusing 0/O's etc.)

// From http://snipplr.com/view/22246/base62-encode--decode/
private function base_encode($val, $base=62, $chars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
    $str = '';
    do {
        $i = fmod($val, $base);
        $str = $chars[$i] . $str;
        $val = ($val - $i) / $base;
    } while($val > 0);
    return $str;
}

Follow the URL to find the reverse 'decode' function too.

Simon East
  • 55,742
  • 17
  • 139
  • 133
-1

all you need to do is convert between different base systems base 10 to base 62

https://github.com/infinitas/infinitas/blob/dev/core/short_urls/models/short_url.php

dogmatic69
  • 7,574
  • 4
  • 31
  • 49