1

I am trying to find a way to encode a database ID into a short URL, e.g. 1 should become "Ys47R". Then I would like to decode it back from "Ys47R" to 1 so I can run a database search using the INT value. It needs to be unique using the database ID. The sequence should not be easily guessable such as 1 = "Ys47R", 2 = "Ys47S". It should be something along the lines of YouTube or bitly's URL's. I have read up on hundreds of different sources using md5, base32, base64 and `bcpow but have come up empty.

This blog post looked promising but once I added padding and a passkey, short ID's such as 1 became SDDDG, 2 became "SDDDH" and 3 became "SDDDI". It is not very random.

base32 used only a-b 0-9 base64 had characters such as == on the end.

I then tried this:

function getRandomString($db, $length = 7) {

    $validCharacters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    $validCharNumber = strlen($validCharacters);
    $result = "";

    for ($i = 0; $i < $length; $i++) {
        $index = mt_rand(0, $validCharNumber - 1);
        $result .= $validCharacters[$index];
    }

Which worked but meant I had to run a database query every time to make sure there were no collisions and it did not exist in the database.

Is there a way I can create short ID's that are 4 characters minimum with a charset of [a-z][A-Z][0-9] that can be encoded and decoded back, using increment unique ID in a database where each number is unique. I can't get my head around advance techniques using base32 or base64.

Or am I looking into this too much and there is an easier way to do it? Would it be best to do the random string function above and query the database to check for uniqueness all the time?

noahnu
  • 3,479
  • 2
  • 18
  • 40
user3882771
  • 25
  • 1
  • 9
  • What's the purpose? If the encoded numbers are not to be predictable, then yes, you have to randomly generate them, check for existence and store them alongside your tables actual `id` keys. – mario Jul 28 '14 at 01:51
  • The purpose was to turn urls into bit.ly type urls. So I can use the $_GET parameter to obtain the id, convert it back to original int so I can check against the database which page they are referring to so I can pull up the details. So after alot of reading I saw two options, 1 creating and checking db, and when alot of pages are being created, also increases the chances for collisions, or as others have mentioned to create it on the fly like of youtube, bitly and using base64 meaning no check is done to db when creating, can be done on the fly and every number is unique upto billions – user3882771 Jul 28 '14 at 02:06
  • The topic is clear. But *why* do you want to convert the page ids? Is it meant as pseudo security feature? It's basically just obfuscation. (And for that base32 would suffice by adding a base value to raise the output length, plus some bit shuffling.) – mario Jul 28 '14 at 02:37
  • That's right, it is suppose to be more of obfuscation, plus good looking url's. An example would be users will be sending out report links to clients, hence good looking short url's. At the same time whilst not confidential and public knowledge I do not want someone to start typing id number's 1,2,3,4 from user or client in the url to see every one elses work one by one etc. – user3882771 Jul 28 '14 at 02:59
  • Can you do the shuffling bit with base64 minus the == signs and decode it back? Here is a link i found whilst it mentions it, it doesnt say how they went about using base64 to create the id's but similar concept of using $_get for unique short urls http://stackoverflow.com/questions/2308579/when-using-a-unique-alphanumeric-string-for-a-short-url-is-it-better-to-store-t – user3882771 Jul 28 '14 at 03:02

2 Answers2

1

You could use function from comments: http://php.net/manual/en/function.base-convert.php#106546

$initial = '11111111';
$dic = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
var_dump($converted = convBase($initial, '0123456789', $dic)); 
// string(4) "KCvt"
var_dump(convBase($converted, $dic, '0123456789')); 
// string(8) "11111111"

function convBase($numberInput, $fromBaseInput, $toBaseInput)
{
    if ($fromBaseInput==$toBaseInput) return $numberInput;
    $fromBase = str_split($fromBaseInput,1);
    $toBase = str_split($toBaseInput,1);
    $number = str_split($numberInput,1);
    $fromLen=strlen($fromBaseInput);
    $toLen=strlen($toBaseInput);
    $numberLen=strlen($numberInput);
    $retval='';
    if ($toBaseInput == '0123456789')
    {
        $retval=0;
        for ($i = 1;$i <= $numberLen; $i++)
            $retval = bcadd($retval, bcmul(array_search($number[$i-1], $fromBase),bcpow($fromLen,$numberLen-$i)));
        return $retval;
    }
    if ($fromBaseInput != '0123456789')
        $base10=convBase($numberInput, $fromBaseInput, '0123456789');
    else
        $base10 = $numberInput;
    if ($base10<strlen($toBaseInput))
        return $toBase[$base10];
    while($base10 != '0')
    {
        $retval = $toBase[bcmod($base10,$toLen)].$retval;
        $base10 = bcdiv($base10,$toLen,0);
    }
    return $retval;
}
sectus
  • 15,605
  • 5
  • 55
  • 97
  • I did see this and that method brings me back to my issue about short ID's such as 1. If you have a look, if you type 1, the generated id is also 1, even if you go 11 result is a, so two issues I mentioned here, nor is the ID like "Ys47R" nor is it unpredictable, everything is in sequence so easily guessable. – user3882771 Jul 28 '14 at 02:12
  • @user3882771 , if you really need to start from `Ys47R` just add number '893269206' before encoding and substract after decoding. – sectus Jul 28 '14 at 03:21
  • Thanks tried that and it seems to achieve the desired results – user3882771 Jul 28 '14 at 19:20
0

If you want some symmetric obfuscation, then base_convert() is often sufficient.

base_convert($id, 10, 36);

Will return strings like 1i0g and convert them back.

Before and after that base conversion you can add:

  • To get a minimum string length, I'd suggest just adding 70000 to your $id. And on the receiving end just subtract that again.

  • A minor multiplication $id *= 3 would add some "holes" in the generated alphanumeric ID range, yet not exhaust the available string space.

  • For some appearance of arbitrariness, a bit of nibble moving:

    $id = ($id & 0xF0F0F0F) << 4    
        | ($id & 0x0F0F0F0) >> 4;
    

    Which works for generating your obfuscated ID strings, and getting back the original ones.

    Just to be crystal clear: this is no encryption of any sort. It just shifts numeric jumps between consecutive numbers, and looks slightly more arbitrary.

You still may not like the answer, but generating random IDs in your database is the only approach that really hinders ID guessing.

mario
  • 144,265
  • 20
  • 237
  • 291
  • Would it be faster or better to do a database check up rather than encoding and decoding on the fly? – user3882771 Jul 28 '14 at 19:20
  • Use a profiler, not guesses. Neither is significant if implemented correctly. – mario Jul 28 '14 at 20:52
  • Yes, I can google that for you: [Simplest way to profile a PHP script](http://stackoverflow.com/a/20672191) – mario Jul 29 '14 at 14:31
  • Thanks, I just realized that creating a random ID and saving it to database and then checking to see if it exists, so I will also need to create a detection method to identify when combinations are running out and I need to increase the length of randomID to accommodate more combinations. Seems like a lot of messing about to achieve something basic, I wonder how like's of bit.ly, youtube do it. Do they trawl millions of records to see if ID exists? And when they are reaching limits, increase length? – user3882771 Jul 29 '14 at 14:40