0

Is it possible to scale the following functions taken from https://stackoverflow.com/a/9848014/2704706 to encode/decode numbers into an 11 character string?

function lfsr($x) {
    return ($x >> 1) ^ (($x&1) ? 0xe10000 : 0);
}
function to_4($x) {
    for($i=0;$i<24;$i++)
        $x = lfsr($x);
    $str = pack("CCC", $x >> 16, ($x >> 8) & 0xff, $x & 0xff);
    return base64_encode($str);
}

function rev_lfsr($x) {
    $bit = $x & 0x800000;
    $x = $x ^ ($bit ? 0xe10000 : 0);
    return ($x << 1) + ($bit ? 1 : 0);
}
function from_4($str) {
    $str = base64_decode($str);
    $x = unpack("C*", $str);
    $x = $x[1]*65536 + $x[2] * 256 + $x[3];
    for($i=0;$i<24;$i++)
        $x = rev_lfsr($x);
    return $x;
}

for($i=0; $i<256; $i++) {
    $enc = to_4($i);
    echo $enc . " " . from_4($enc) . "\n";
}

My end goal would be to use these methods to form URLs with encoded IDs in a similar fashion to the video IDs included in the v $_GET variable in YouTube's URLs, i.e. RArlg6HeZZM in http://www.youtube.com/watch?v=RArlg6HeZZM.

Thanks ahead of time.

Community
  • 1
  • 1
Barry Beerman
  • 303
  • 3
  • 9
  • 1
    Using a hash algorithm is good for "reducing" a string to something small; but you would want to use a lookup to "go the other way". – Floris Aug 21 '13 at 19:07
  • I'd actually like to use an encoding algorithm similar to the one I included above instead of a hashing algorithm so that I can decode IDs back to their original numerical representation. – Barry Beerman Aug 21 '13 at 23:41
  • You could just use any encryption function available in PHP, first converting the sequence number into an appropriately long character string. 11 characters of base-64 is presumably 64 bits, or eight bytes, which is a convenient encryption block. – rici Aug 22 '13 at 01:20
  • Ok, that sounds great. Would you be able to show me how to change the functions I included in my original question from building a 4 character string to building an 11 character string? My understanding of bit reordering and xor'ing are seriously lacking. – Barry Beerman Aug 22 '13 at 02:22
  • I don't really know much PHP. But I'd guess you could do something like this: `$c11 = base64_encode(mcrypt_encrypt(MCRYPT_BLOWFISH, 'AbCdEfG', pack("V2", 0, $seqno), MCRYPT_MODE_ECB));` To get the seqno back, you'd reverse the whole thing: $array = unpack("V2num", mcrypt_decrypt(MCRYPT_BLOWFISH, 'AbCdEfG', base64_decode($c11), MCRYPT_MODE_ECB)); $seqno = $array['num2'];`. That could be completely wrong. – rici Aug 22 '13 at 04:02
  • It looks like that works! A couple questions though: 1) $c11 always has an = on the end of it. Is this because using an 11 character string causes the last group to only contain 16 bits so base64_encode() adds padding? 2) It looks like the highest value that can be encoded is 2,147,483,647. Is that a limitation of this encoding method? Is there a way to support larger ids without giving up the 11 character limitation? – Barry Beerman Aug 22 '13 at 15:38

1 Answers1

0

I don't know what youtube is doing, but I would do this: randomly generate a string of a fixed length, attempt to insert it into a table where its the primary key, and if you get a primary key exception error, generate a new string and try to insert again. Keep doing that until you finally get a successful insertion (i.e. a random string that is not in the table already). I actually do use this in one app.

Edit:--

Or you can add this as a unique field rather than the primary key:

 alter table tbl add randomstring  varchar(11) unique;

Then you can still have an auto_increment number as primary key.

developerwjk
  • 8,619
  • 2
  • 17
  • 33
  • 1
    Thanks for the suggestion. I'd prefer to encode an auto-incremented numerical primary key instead of using randomly generated strings as a primary key though because they allow for faster DB look-ups. – Barry Beerman Aug 22 '13 at 00:55
  • See my edit. Just add this field as UNIQUE rather than primary key. – developerwjk Aug 23 '13 at 16:23
  • My concern is that someone passes the random string aKieE9-Oadw in a $_GET variable to my app. Instead of querying the DB using the primary key, I'll have to perform a query using the random string's column which will be must less efficient. – Barry Beerman Aug 23 '13 at 17:40