A couple months ago, we were using UUIDs to generate random string IDs that needed to be unique across the board. I then changed the algorithm in order to save some data and index space in our database. I tested a few ways to generate unique string IDs, and I decided to use this function:
function generateToken($length) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz';
$max = strlen($characters) - 1;
$token = '';
for ($i = 0; $i < $length; $i++) {
$token .= $characters[mt_rand(0, $max)];
}
return $token;
}
I'm using this function to generate IDs that are 20 characters long using digits and letters, or you could say these IDs are numbers in base 36. The probability of any 2 IDs colliding should be 1/36^20, but due to the birthday paradox, it can be expected for a collision to occur after about 36^10 records - that's 3.6 quadrillion records. Yet, just a few hours ago a collision occurred, when there were only 5.3 million existing records in the database. Am I extremely unlucky, or my ID-generating function is flawed with respect to randomness? I know mt_rand() isn't truly random, but it is random enough, isn't it?
I would've written a loop that checks if the generated ID is unique and generates a new one if it isn't, but I thought that the chance of getting a collision was so small that the performance cost of such a loop wasn't worth it. I will now include such a loop in the code, but I'm still interested in perfecting the ID generation function if it is indeed flawed.