I'm pretty sure YouTube is just encoding integer IDs in a base-X system. There's just so many, and they are created so fast, that they seem random.
The code would look something like:
<?php
$base_str = '0123456789abcdefghijklmnopqrstuvwxyz-_';
$base = strlen($base_str);
// generate a number if no input
if( ! isset($argv[1]) ) {
$number = rand(1000,1000000);
} else {
$number = intval($argv[1]);
}
printf("Input: %d\n", $number);
printf("Base: %d\n", $base);
// will hold the base-X encoded representation of the number
$repr = '';
for( $i=$number; $i>0; ) {
$remainder = $i % $base;
$digit_repr = substr($base_str, $remainder, 1);
$repr = $digit_repr . $repr;
printf("Rem: %2d Repr: %s Cur: %16d Progress: %s\n", $remainder, $digit_repr, $i, $repr);
$i = ($i - $remainder) / $base;
}
Example output:
Input: 2000000
Base: 38
Rem: 22 Repr: m Cur: 2000000 Progress: m
Rem: 1 Repr: 1 Cur: 52631 Progress: 1m
Rem: 17 Repr: h Cur: 1385 Progress: h1m
Rem: 36 Repr: - Cur: 36 Progress: -h1m
If you want to introduce a little more "randomness" into how the IDs look you can always scramble $base_str
. Just keep in mind that you can only scramble it once before you start encoding IDs.
Decoding
I guess that's important, right?
<?php
$base_str = '0123456789abcdefghijklmnopqrstuvwxyz-_';
$base = strlen($base_str);
if( ! isset($argv[1]) ) {
$input = '-h1m';
} else {
$input = $argv[1];
}
printf("Input: %s\n", $input);
printf("Base: %d\n", $base);
$repr = str_split($input);
$number = 0;
for( $i=0; $i<count($repr); $i++) {
$number = $number * $base;
$value = strpos($base_str, $repr[$i]);
$number += $value;
printf("Char: %s Value: %2d Cur: %12d\n", $repr[$i], $value, $number);
}
Example output:
Input: -h1m
Base: 38
Char: - Value: 36 Cur: 36
Char: h Value: 17 Cur: 1385
Char: 1 Value: 1 Cur: 52631
Char: m Value: 22 Cur: 2000000