0

I've got a unique ID generator in my project and I've created this based on Scott's answer in here do you think it's possible for me to get duplicates with this?

$bytes = random_bytes(15);
$int = time() * random_int(100,999);
echo substr('OPT_' . bin2hex($bytes) . $int , 0, 50);
// Output: OPT_b2aad7ca373f363e2bcfbf5ab3e8ce553027068680

Just changed it with this if anyone is wondering,

public function generateUniqueKey(){
    return strtoupper('OPT' . sprintf('%04x%04x-%04x-%04x-%04x-%04x%04x%04x',
        mt_rand(0, 0xffff), mt_rand(0, 0xffff),
        mt_rand(0, 0xffff),
        mt_rand(0, 0x0fff) | 0x4000,
        mt_rand(0, 0x3fff) | 0x8000,
        mt_rand(0, 0xffff), mt_rand(0, 0xffff), mt_rand(0, 0xffff)
    ) );
}
// Output: OPTEF49BAAB-A14C-484E-B475-EA8CD02DBF1F

Check out Rob's answer, he explained it perfectly.

  • What is the goal you are trying to achieve there? Where do you need this string? Why do you not use any of the other mentioned string generators? – Progman Jul 17 '21 at 20:34
  • Well I'm just trying to create a unique ID man, I send this to an API where they save it to their database. And I am already, Scott's answer was the most upvoted one so I just used it, I've just added time and OPT string to it –  Jul 17 '21 at 21:10
  • It is definitely possible to get collisions with this, however they would be very infrequent. You can experiment with generating values in a loop and looking for collisions. When you lower the number of random bytes and the floor/ceiling for the random int, you will see collisions frequently. – Rob Ruchte Jul 17 '21 at 22:17
  • @RobRuchte Thank you for the answer, what do you suggest that I should do in this case? Just ditch the time and random int and increase the byte length? –  Jul 17 '21 at 23:04
  • 1
    If you can, keep used IDs on your side, and check generated IDs against the list, regen if you get a collision. For the generator, it's fine. Depending on the particulars of your application, just using the built in uniqid function with "more entropy" may serve you better. The way it uses time makes collisions less likely over time than this code under most circumstances. If you've got a lot of different servers generating IDs, uniqid is not a good fit, but if you only have one instance, uniqid is probably better. Here's why: https://heap.space/xref/PHP-8.0/ext/standard/uniqid.c?r=2b5de6f8 – Rob Ruchte Jul 18 '21 at 02:10
  • Thanks, this was my first intention but I run this function with an AJAX call and there's user interaction so I really can't make people wait for this query to happen, I just went with UUID, seems to be much better. Thanks for the help –  Jul 18 '21 at 04:41

1 Answers1

0

This generator can have collisions, although it is extremey unlikely.

If we reduce the byte length and range of random numbers, we can get collisions consistently after a fairly low number of iterations. The values you're using make collisions exponentially less likely, but this shows that it's technically possible. Since you're using time in seconds multiplied between 100 and 999, there are 899 seconds (basically 15 minutes) during which collisions are possible (although highly unlikely) for a given token.

<?php

//Test settings
$byteLength = 3;
$randFloor = 1;
$randCeiling = 5;

$cache = [];

$i = 0;
while (true)
{
    $bytes = random_bytes($byteLength);
    $int   = time() * random_int($randFloor, $randCeiling);
    $val   = substr('OPT_' . bin2hex($bytes) . $int , 0, 50);

    $i++;
    if(in_array($val, $cache))
    {
        echo 'Found collision '.$val.' after '.$i.' iterations'.PHP_EOL;
        break;
    }

    if($i % 100000 == 0)
    {
        echo $i.' with no collisions...'.PHP_EOL;
    }

    $cache[] = $val;
}

A few runs:

Found collision OPT_e266e53253149272 after 7251 iterations
Found collision OPT_c6f67e8132873195 after 3572 iterations
Found collision OPT_8156061626574644 after 14993 iterations
Found collision OPT_ccddd76506298584 after 1606 iterations
Found collision OPT_cb06cc1626574650 after 16274 iterations

PHP's built-in uniqid function with "more entropy" does not have this problem, beacuse its result is unique (per thread anyway) for each microsecond. You can see how it works here. Note that this can still be (and may be more of) a problem if you have multiple servers generating IDs, or are generating a LOT of ids at the same time on a system with a bunch of threads/cores. (Time and clock changes can induce more risk of collisions, as can solar flares or stray neutrinos, no purchase necessary, void where prohibited, call your doctor if the condition lasts longer than four hours...)

uniqid('OPT_', true);

Output:

OPT_60f3965b050f39.25070751

You could get the benefits of uniqid and increase the randomness by using random_bytes to create the prefix. Reduce your number of bytes to set your max length, don't trim off the end.

$bytes  = random_bytes(11);
$prefix = 'OPT_' . bin2hex($bytes);
$val    = uniqid($prefix, true);

Output:

OPT_5f889773483a0610de61c560f3a57c05dfa5.41225914
OPT_657940118a0e9663c0f86060f3a57c05e825.40200037
OPT_67df31a0252325324e311860f3a57c05f071.69465782
OPT_009f72e62d70b083360e0e60f3a57c05f8e0.85617746
OPT_4d1ca0d26a24bceb4c740460f3a57c0601c4.95161219

What I usually do for things like this is keep track of generated tokens, and check against the list when generating a new one. This reduces the risk of collisions to pretty much zero as long as your storage has strong consistency. If you can do this on your side, you can be confident that you won't send collisions to the API you're calling.

$validToken = false;
while (!$validToken)
{
    $token = self::generateToken();

    $count = $db->query('SELECT COUNT(*) FROM mytable WHERE token=?', $token);

    $validToken = ($count == 0);
}

// Do something with token
Rob Ruchte
  • 3,569
  • 1
  • 16
  • 18