2

I'm wondering how Instapaper (bookmarklet that saves text) might generate URLs for their bookmarklet.

Mine has a script src of something similar to www.instapaper.com/j/AnJHrfoDTRia

The quality of these URLs is that they need to never collide, and not be really guessable (so other people can't save to your account).

I know a simple approach might be to MD5 their email address (presumed to have been checked on signup for uniqueness), but then I'd end up with a super long string. This isn't a huge issue, but I'm wondering what techniques there are for shorter GUIDs that won't collide too often (this is obviously the tradeoff, but 12 characters above is pretty short in my opinion)

Alex Mcp
  • 19,037
  • 12
  • 60
  • 93
  • If you have entries in your database with (auto-incrementing) integer IDs already, just use [`(new Id())->encode($id)`](https://github.com/delight-im/PHP-IDs) to get what they have. This is entirely collision-free, but obfuscated and short. – caw Nov 20 '19 at 15:49

4 Answers4

2

You can get a shorter string by treating the MD5 hash as a number in base 16 (that uses characters(0-9a-f) and converting it to for example base 36.

<?php
function gmp_convert($num, $base_a, $base_b) {
    return gmp_strval (gmp_init($num, $base_a), $base_b );
}

$hash = md5("hello");
$hash2 = gmp_convert($hash,16,36);
echo "$hash <br>"; //5d41402abc4b2a76b9719d911017c592 
echo $hash2; //5ir3t0ozoelrnauhrwyu1xfgy

The link you mention seems to be using all the letters (upper and lowercase).

Information extracted from these Q&As

Community
  • 1
  • 1
user570783
  • 686
  • 4
  • 7
0

Base64 encode a cryptographically strong set of random numbers.

<?php
// get 72 pseudorandom bits in a base64 string of 12 characters

$pr_bits = '';

// Unix/Linux platform?
$fp = @fopen('/dev/urandom','rb');
if ($fp !== FALSE) {
    $pr_bits .= @fread($fp,9);
    @fclose($fp);
}

// MS-Windows platform?
if (@class_exists('COM')) {
    // http://msdn.microsoft.com/en-us/library/aa388176(VS.85).aspx
    try {
        $CAPI_Util = new COM('CAPICOM.Utilities.1');
        $pr_bits .= $CAPI_Util->GetRandom(9,0);

        // if we ask for binary data PHP munges it, so we
        // request base64 return value.  We squeeze out the
        // redundancy and useless ==CRLF by hashing...
        if ($pr_bits) { $pr_bits = substr(md5($pr_bits,TRUE), 0, 9); }
    } catch (Exception $ex) {
        // echo 'Exception: ' . $ex->getMessage();
    }
}

$uid = base64_encode($pr_bits);
?>

This will give you 72 bits of the purest Columbian in 12 characters. This set contains roughly 10^21 numbers. This means that the chance of collision is about 1 in a billion after 1 million users.

This is a very slight modification of this stackoverflow answer for generating crypto awesomeness: Secure random number generation in PHP.

Community
  • 1
  • 1
Waylon Flinn
  • 19,969
  • 15
  • 70
  • 72
0
<?php

$length = 12;

$chars = array_merge(range(0, 9), range('a', 'z'), range('A', 'Z'));

$hash = '';

for ($i = 0; $i < $length; $i++) {
    $hash .= $chars[array_rand($chars)];
}

var_dump($hash);

This will give us 3226266762397899821056 unique combinations vs 281474976710656 for md5 (which is 11 million times bigger).

For just 4 chars (!!!) it will be 14776336 unique combinations, which can be enough for you.

zerkms
  • 249,484
  • 69
  • 436
  • 539
-3

MD5 the username. Take the first X characters of the resulting MD5 hash. Check to see if there is already a url token with that value in the DB. If so, take the first X+1 characters and try that (and so on). If not, then you have your token for that user. Store the token in the DB and look it up there from now on - don't try to re-create the token from the username each time or whatnot.

You could probably start with X=7 and do fine (no more than 1-2 tries for the vast majority of token generations).

Also, you may want to add something else into the hash calculation (say, their or a random number) just to make it harder to predict a given user's token.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • 1
    Any reason to use `md5` of `name` if it will be `+1 char` sometime (it means **predictable**)? Why don't just `md5(microtime(1))` or `md5(uniqid())`? – zerkms Jan 11 '11 at 04:14
  • @zerkms: All perfectly acceptable options. You could even just `md5(rand())`. – Amber Jan 11 '11 at 04:23
  • 1
    also to increase capacity it can be a good idea not to use md5, but just generate randomly char from 0-9a-zA-Z. It will give us 3226266762397899821056 unique combinations vs 281474976710656 for md5 (which is **11 million times** bigger) – zerkms Jan 11 '11 at 04:30
  • Though once you get past the millions/billions mark, it doesn't always matter. – Amber Jan 11 '11 at 04:40
  • @Amber: but it allows to make hash **much** shorter: 1048576 (obviously is not enough) vs 916132832 for just 5 chars. Even for 4 chars: 65536 vs 14776336, second satisfies our desires, while first is terrible. – zerkms Jan 11 '11 at 04:52
  • Right, but a difference of 3-4 characters either way isn't really a huge issue. – Amber Jan 11 '11 at 05:14