5

I'm coding an url shortener function for a project in which I'm learning php, here is the code (btw I suppose that global here is not a good thing to do :P):

$alphabet = array(1 => "a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z",
                "A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z",
                "0","1","2","3","4","5","6","7","8","9","_","-");

function shorten($id){
    global $alphabet;
    $shortenedId = "";
    while($id>0){
        $remainder = $id % 64;
        $id = $id / 64;     
        $shortenedId = $alphabet[$remainder].$shortenedId;
    }
    return $shortenedId;
}

The code is taken from this Wikipedia article and adapted to php. My problem is that when I pass a multiple of 64 to the function I get a wrong (for my purpose) result, for instance 128 returns b which is not correct, it should have been aaa, but that's too long for a 3-digit number.

Also I'm starting to think that there's something wrong in this code, if I pass 1'000'000'000'000 as $id I get nItOq... I feel it's wrong because a url shortening service like bit.ly returns a 6 number id if I use it, and I don't think that this algorithm is better than theirs.

So, two questions:

  • do you spot any bug in the above code?
  • how to manage 64-multiple ids? Do I have to just ignore them and pass to the next one?
Alberto Zaccagni
  • 30,779
  • 11
  • 72
  • 106
  • You don't need to use global there (I don't see any reason for that?). In fact, it's not been recommended to use global variables in PHP for years (at least since first release of PHP5). Use dependency injection instead. – Richard Knop Jul 08 '10 at 09:00
  • @Richard Knop: Without that the variable `$alphabet` was not accessible. – Alberto Zaccagni Jul 08 '10 at 09:33
  • Oh well, I see now, the scope has changed because you're inside a function. But why not just pass the $alphabet as second parameter to the function? Globals are not recommended in PHP, really. – Richard Knop Jul 08 '10 at 10:19
  • Mh, ok, I think I'll just put it in the function as nathan did. – Alberto Zaccagni Jul 08 '10 at 21:16
  • At the time that this question was asked, we did not have a [code review](https://codereview.stackexchange.com/) site. We do now, if anyone wants their code reviewed – Mawg says reinstate Monica Oct 16 '17 at 17:09

8 Answers8

14

Just a couple of little tweaks needed, the main two were to make the the alphabet zero indexed rather than one-indexed, and to subtract the remainder from the id before dividing

function shorten($id)
{
    $alphabet = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';
    $shortenedId = '';
    while($id>0) {
        $remainder = $id % 64;
        $id = ($id-$remainder) / 64;     
        $shortenedId = $alphabet{$remainder} . $shortenedId;
    };
    return $shortenedId;
}

and here's a further modified version which... well I just like

function shorten($id, $alphabet='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-')
{
    $base = strlen($alphabet);
    $short = '';
    while($id) {
        $id = ($id-($r=$id%$base))/$base;     
        $short = $alphabet{$r} . $short;
    };
    return $short;
}

EDIT: sorted concatenation to be the same as the OPs

nathan
  • 5,402
  • 1
  • 22
  • 18
  • Is doing this `$shortenedId .= $alphabet{$remainder};` the same as doing `$shortenedId = $alphabet[$remainder].$shortenedId;`? In the second I'm sure that the new digit is appended *before*, is this also the case for the code you've written? – Alberto Zaccagni Jul 08 '10 at 07:54
  • 1
    @Montecristo $shortenedId .= $alphabet{$remainder}; is the same thing as $shortenedId = $shortenedId . $alphabet{$remainder}; – Richard Knop Jul 08 '10 at 08:57
  • @Richard Knop, @Montecristo : thanks for the catch I've edited the functions to concatenate as per the original – nathan Jul 08 '10 at 10:44
  • 1
    Is it possible to unshort? – dkapa Oct 06 '14 at 11:10
  • 1
    Yes, it is possible to unshort, with the answer that is here http://stackoverflow.com/a/32020520/728236 – Brian Leishman Sep 03 '15 at 12:36
5

In case you're looking for the opposite function to take a base64 number and convert to base10, here's some PHP based off of the JavaScript in this answer: How to convert base64 to base10 in PHP?

function lengthen($id) {
    $alphabet='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_-';

    $number=0;
    foreach(str_split($id) as $letter) {
        $number=($number*64) + strpos($alphabet,$letter);
    }
    return $number;
}
Community
  • 1
  • 1
rgbflawed
  • 1,957
  • 1
  • 22
  • 28
2

How about this:

function shorten_int($id){
    $hex = base_convert(id, 10, 16);
    $base64 = base64_encode(pack('H*', $hex));
    //$base64 = str_replace("/", "_", $base64); // remove unsafe url chars
    //$base64 = str_replace("+", "-", $base64);
    //$base64 = rtrim($base64, '='); // Remove the padding "=="
    $replacePairs = array('/' => '_',
                          '+' => '-',
                          '=' => '');
    $base64 = strtr($base64, $replacePairs); // optimisation
    return $base64;
}
malhal
  • 26,330
  • 7
  • 115
  • 133
1

By the way, check out the base_convert() function (http://php.net/manual/en/function.base-convert.php):

echo base_convert(1000000000, 10, 36);

36 is the longest base it can convert to, though. But in the comments section I found this:

function dec2any( $num, $base, $index=false ) {
    if (! $base ) {
        $base = strlen( $index );
    } else if (! $index ) {
        $index = substr( "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" ,0 ,$base );
    }
    $out = "";
    for ( $t = floor( log10( $num ) / log10( $base ) ); $t >= 0; $t-- ) {
        $a = floor( $num / pow( $base, $t ) );
        $out = $out . substr( $index, $a, 1 );
        $num = $num - ( $a * pow( $base, $t ) );
    }
    return $out;
}

echo dec2any(1000000000, 64, "_-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");

Maybe it will help?

Richard Knop
  • 81,041
  • 149
  • 392
  • 552
  • Thanks for posting this, yes I did find it, though I failed at understanding it... so I used another approach, I dislike having code I don't understand in my projects. :D – Alberto Zaccagni Jul 08 '10 at 21:15
1

These two functions are very convenient, thanks to @malhal:

function shorten_int($id)
{
    $id=dechex($id);
    $id=strlen($id)%2===0?hex2bin($id):hex2bin('0'.$id);
    $id=base64_encode($id);
    $id=strtr($id, array('/'=>'_', '+'=>'-', '='=>''));
    return $id;
}

function unshorten_int($id)
{
    $id=strtr($id, array('-'=>'+', '_'=>'/'));
    $id=base64_decode($id);
    $id=bin2hex($id);
    return base_convert($id, 16, 10);
}

echo shorten_int(43121111)."\n";
echo unshorten_int(shorten_int(43121111))."\n";
diyism
  • 12,477
  • 5
  • 46
  • 46
1

Paul Greg created some PHP code that converts from Base-10 to another base. This can be tested and the code downloaded here:

http://www.pgregg.com/projects/php/base_conversion/base_conversion.php

I'm using this approach to convert the database row id's to Base-64. Once these numbers have been shortened they can be used in the URL. [details]

Gabe Sumner
  • 4,978
  • 6
  • 33
  • 43
0

This is a variation of Nathans code to handle large integers greater than PHP_INT_MAX.

This uses the BC Maths Functions that should be built-in on Windows servers, but this needs to be enabled as an optional extension on Unix servers. This solution also requires a couple of custom BC functions to handle floor and round functions that I copied from the post by Alix Axel.

function shorten($value, $alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-') {
    $base = strlen($alphabet);
    $result = '';
    while ($value) {
        $mod = bcmod($value, $base);
        $value = bcfloor(bcdiv($value, $base));
        $result = $alphabet[$mod] . $result;
    }
    return $result;
  }

function lengthen($value, $alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-') {
    $base= strlen($alphabet);
    $result = '';
    for($i = 0, $limit = strlen($value); $i < $limit; $i++) {
        $result = bcadd(bcmul($base, $result), strpos($alphabet, $value[$i]));
    }
    return $result;
}

function bcceil($number) {
    if (strpos($number, '.') !== false) {
        if (preg_match("~\.[0]+$~", $number)) return bcround($number, 0);
        if ($number[0] != '-') return bcadd($number, 1, 0);
        return bcsub($number, 0, 0);
    }
    return $number;
}

function bcfloor($number) {
    if (strpos($number, '.') !== false) {
        if (preg_match("~\.[0]+$~", $number)) return bcround($number, 0);
        if ($number[0] != '-') return bcadd($number, 0, 0);
        return bcsub($number, 1, 0);
    }
    return $number;
}

function bcround($number, $precision = 0) {
    if (strpos($number, '.') !== false) {
        if ($number[0] != '-') return bcadd($number, '0.' . str_repeat('0', $precision) . '5', $precision);
        return bcsub($number, '0.' . str_repeat('0', $precision) . '5', $precision);
    }
    return $number;
}

Examples running PHP 5.6 on Windows (32 bit)

foreach ([0, 1, 9, 10, 115617, bcsub(PHP_INT_MAX, 1), PHP_INT_MAX, bcadd(PHP_INT_MAX, 1234567890)] as $value) {
    $short = shorten($value);
    $reversed = lengthen($short);
    print shorten($value) . " ($value)<br>";
    if ("$value" !== $reversed) {
        print 'ERROR REVERSING VALUE<br>';
    }
}

Outputs

0 (0)
1 (1)
9 (9)
a (10)
sex (115617)
1----_ (2147483646)
1----- (2147483647)
39Bwbh (3382051537)

If the ID is public, avoid using vowels in the string (115617 is shortened to sex for example). This would be the base 54 version that should provide safe words.

$alphabet = '0123456789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ_-';
HK boy
  • 1,398
  • 11
  • 17
  • 25
-1

You can use the pack.

$int = 1129717211140920362;

$byte = pack('J*', $int);    
echo base64_encode($byte); //= D62P0WqzFCo=

It will result in D62P0WqzFCo=, it is correct, because the $int is an int64 and uses 64 bits. The Base64 uses 6 bits for each character, so they need ~11 characters.

To decode use:

$base64 = 'D62P0WqzFCo=';

$byte = base64_decode($base64);
echo unpack('J*',  $byte)[1]; //= 1129717211140920362

It will return 1129717211140920362. ;)


It was based in the answer on the Stackoverflow in Portuguese.

Inkeliz
  • 892
  • 1
  • 13
  • 25
  • @RafaelLima, why it's wrong? The "string" is a `pack`, which convert the integer (base10) to a string form. The `base64` isn't enconding as string directly, it's actually encode as int64. Suppose you have a `1` as string you end up with `MQ==` instead of `AAAAAAAAAAE=`, which is the int64 in big ending order, so it takes a number and convert it correctly, in int64. – Inkeliz Jul 04 '18 at 23:04