77

This question isn't really a problem looking for a solution, it's more just a matter of simple curiosity. The PHP uniqid function has a more entropy flag, to make the output "more unique". This got me wondering, just how likely is it for this function to produce the same result more than once when more_entropy is true, versus when it isn't. In other words, how unique is uniqid when more_entropy is enabled, versus when it is disabled? Are there any drawbacks to having more_entropy enabled all the time?

Dinah
  • 52,922
  • 30
  • 133
  • 149
GordonM
  • 31,179
  • 15
  • 87
  • 129
  • 4
    If you want something that's *always* unique, you'd need to implement a [GUID](http://en.wikipedia.org/wiki/Globally_unique_identifier). Pretty much anything else will eventually collide since there's only so much entropy in the function. For example, `uniqid` with `more_entropy` set gives **only** about 92 bits of entropy (23 hexbits). To understand why that's not good for uniqueness, see [The Birthday Problem](http://en.wikipedia.org/wiki/Birthday_problem)... – ircmaxell Nov 01 '10 at 15:16
  • @ircmaxell thanks for pointing out The Birthday Problem, it's quite interesting. It should be definitely mentioned in the answer. – Petr Peller Oct 26 '11 at 15:11
  • 2
    uniqid() isn't a hash function so The Birthday Problem doesn't apply to it. It does have its vulnerabilities though. – Joel Mellon Nov 06 '12 at 01:16
  • @ircmaxell where does that number come from? `more_entropy` is about 30 bits of entropy (nine decimal digits), the microsecond part is about 20 (six decimal digits), where does the rest come from? You would need to pick the second from a 100,000 year range to get 42 bits of entropy. – Tgr Jul 24 '16 at 00:12

7 Answers7

38

Update, March 2014:

Firstly, it is important to note that uniqid is a bit of a misnomer as it doesnt guarantee a unique ID.

Per the PHP documentation:

WARNING!

This function does not create random nor unpredictable string. This function must not be used for security purposes. Use cryptographically secure random function/generator and cryptographically secure hash functions to create unpredictable secure ID.

And

This function does not generate cryptographically secure tokens, in fact without being passed any additional parameters the return value is little different from microtime(). If you need to generate cryptographically secure tokens use openssl_random_pseudo_bytes().


Setting more-entropy to true generates a more unique value, however the execution time is longer (though to a tiny degree), according to the docs:

If set to TRUE, uniqid() will add additional entropy (using the combined linear congruential generator) at the end of the return value, which increases the likelihood that the result will be unique.

Note the line increases the likelihood that the result will be unique and not that is will guarantee uniqueness.

You can 'endlessly' strive for uniqueness, up to a point, and enhance using any number of encryption routines, adding salts and the like- it depends on the purpose.

I'd recommend looking at the comments on the main PHP topic, notably:

http://www.php.net/manual/en/function.uniqid.php#96898

http://www.php.net/manual/en/function.uniqid.php#96549

http://www.php.net/manual/en/function.uniqid.php#95001

What I'd recommend is working out why you need uniqueness, is it for security (i.e. to add to an encryption/scrambling routine)? Also, How unique does it need to be? Finally, look at the speed consideration. Suitability will change with the underlying considerations.

Lucas
  • 16,930
  • 31
  • 110
  • 182
SW4
  • 69,876
  • 20
  • 132
  • 137
  • 1
    The most important lesson with those function comments is that uuid by itself is a very dangerous identifier to pass as a cookie/client-readable ID, but as a local/protected unique ID it has some good uses, namely speed. 2.5 cents. – DrPerdix Nov 01 '10 at 15:26
  • 3
    I don't know if this was obvious yet, but don't use `uniqid` (or it's derivatives) for anything related to security. PHP offers a whole slew of crypto-safe random generators, such as: `openssl_random_pseudo_bytes`. Please use the right tool for the job. – Halcyon Dec 04 '13 at 16:01
  • 1
    Assuming no 2 files are saved at the same microsecond, a unix microsecond timestamp would be unique for every file. – CMCDragonkai Jan 12 '14 at 17:01
  • It's statistically unlikely that you'll get a collision, but not impossible. Put your uniqid generation inside a `do{} while(collision)`. I use this approach when generating paths for uploaded files, for example. – afilina Aug 12 '15 at 14:04
  • 3
    Not sure why was this answer accepted. Unique != random/unpredictable – gadelat May 19 '16 at 08:40
  • @Halcyon let's say for example AES CTR mode does seemingly not need a random or unpredictable thing. just unique and if there a would be function which guarantees uniqueness until the overflow (e.g. just a 32 bit counter, try overflowing that in a resonable amount of time) and that's actually better because the problem with random functions is the more values you use the higher the probability of a collision you get, something that cant happen with a counter – My1 Feb 24 '17 at 00:08
  • I used `uniqid()` to generate values in tests, so as to eliminate false positives. If not given a unique prefix for every value, collisions tend to happen now and then. – XedinUnknown May 16 '18 at 09:56
20

Things are only unique if you check that they don't exist already. It doesn't matter what function you use to generate a 'random' string, or ID - if you don't double check that it isn't a duplicate, then there's always that chance.. ;)

While uniqid is based on the current time, the cautionary note above still applies - it just depends on where you will be using these "unique IDs". The clue to all this is where it says "more unique". Unique is unique is unique. How you can have something which is more or less unique, is a bit confusing to me!

Checking as above, and combining all this stuff will let you end up with something approaching uniqueness, but its all relative to where the keys will be used and the context. Hope that helps!

danp
  • 14,876
  • 6
  • 42
  • 48
  • 14
    There is a *huge* difference between "chance of getting a collision is one in ten thousand" and "change of getting a collision is less than every single user of the program getting hit by lightning simultaneously". A 128 bit value generated by a good RNG with a good seed is so close to being "really" unique that it does not matter, considering the incredibly high costs of getting something *provably* (and unpredictably) unique. – Michael Borgwardt Nov 01 '10 at 15:20
  • 8
    Just to further your point @Michael: For 128 bits, you'd need everybody in the US (300 million) to generate 1 million numbers per second for just about a day to get a 50% chance of a collision... For 512 bits, you'd need every body on earth (7 billion people) to generate 1 trillion numbers per second each for the next `10^47` years just to have a 50% chance of a collision... So yes, with a high big enough upper bound on the random number AND a good enough RNG, you can simulate uniqueness with only randomness... – ircmaxell Nov 01 '10 at 15:35
  • To put that time in perspective (`10^47 years`), it's the time it would take to reach the edge of the observable universe traveling at about 1 trillionth of an inch per hour (0.00000000000001 inches per hour)... (traveling at 55 miles per hour, it would take *only* about `10^23` years to reach the edge of the universe)... – ircmaxell Nov 01 '10 at 15:56
  • 1
    I completely agree, with your ideal world examples as above. The chances are minimal. However, randomness is not perfect in the implementations referred to in the original question, and I maintain, the domain where it this unique number is being used is important. If you had 1000 servers each doing 'unique' ID's based on microtimes, and assuming they were unique "just because", then at some point, you may well get burnt. Disregarding any quirks in code.. bugs, or whatever. The difference here is between reality and theory, and that's why we check ;) – danp Nov 01 '10 at 15:58
  • 5
    "The principle of generating small amounts of finite improbability by simply hooking the logic circuits of a Bambleweeny 57 Sub- Meson Brain to an atomic vector plotter suspended in a strong Brownian Motion producer (say a nice hot cup of tea) were of course well understood." – danp Nov 01 '10 at 15:58
  • 1
    @ircmaxell: The catch is that those numbers require *real* randomness, and thus a real RNG. You couldn't even simulate it with a PRNG with >128 bits of internal state, unless you also had a way to seed it with a unique/random >128-bit value. But that's the very problem you have to solve! And anything less than that, virtually guarantees collisions. Those same 300M people, were they using their compiler's crappy stock `rand()`, would have >90% chance of collision *on the very first iteration*. Plus, if you *need* uniqueness, even .001% collision chance is too much. – cHao Aug 22 '13 at 13:13
  • 1
    "Things are only unique if you check that they don't exist already." that is not true. in multi-thread or multi-process situation, "CHECK IF EXISTS" can not stop conflict. – truease.com Sep 04 '13 at 08:34
  • haha by the way is not possible to say how something is unique while in real senses we know that unique is unique...... – Karue Benson Karue Mar 06 '16 at 19:49
12

From the discussions about the function on the PHP manual site:

As others below note, without prefix and without "added entropy", this function simply returns the UNIX timestamp with added microsecond counter as a hex number; it's more or less just microtime(), in hexit form.

[...]

Also worth to note is that since microtime() only works on systems that have gettimeofday() > present, which Windows natively DOES NOT, uniqid() might yield just the single-second-resolution UNIX timestamp in a Windows environment.

In other words without "more_entropy", the function is absolutely horrible and should never be used, period. Accoding to the documentation, the flag will use a "combined linear congruential generator" to "add entropy". Well, that's a pretty weak RNG. So I'd skip this function entirely and use something based on mt_rand with a good seed for things that are not security-relevant, and SHA-256 for things that are.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
8

Without the more_unique flag, it returns the unix timestamp with a microsecond counter, therefore if two calls get made at the same microsecond then they will return the same 'unique' id.

From there it is a question of how likely that is. The answer is, not very, but not to a discountable degree. If you need a unique id and you generate them often (or work with data generated elsewhere), don't count on it to be absolutely unique.

Reese Moore
  • 11,524
  • 3
  • 24
  • 32
  • 22
    believe it or not, it actually calls usleep(1) to make sure that never happens! – Eli Jul 22 '11 at 21:57
  • 2
    @Eli not sure if trolling or not, but clearly that is not the case because I'm getting duplicates running this: for($i=0; $i<10; $i++) echo uniqid() . "\n"; – djule5 Jul 16 '14 at 15:14
  • 4
    @djule5 Nope, not trolling: https://github.com/php/php-src/blob/af6c11c5f060870d052a2b765dc634d9e47d0f18/ext/standard/uniqid.c#L67 Are you perhaps running a very old version of PHP or are on a platform for usleep doesn't exist? – Eli Jul 16 '14 at 20:05
  • @Eli interesting haha thanks for the source! I'm running PHP 5.5.11 but I'm on Windows on this dev machine... so that probably explains it! So it's definitely not as unique on Windows then... – djule5 Jul 16 '14 at 22:25
  • interesting no more usleep https://github.com/php/php-src/blob/PHP-7.2.12/ext/standard/uniqid.c – user5542121 Nov 02 '18 at 15:58
  • 1
    @user5542121 they decided not to call usleep and poll time instead since usleep "may cause the kernel to schedule another process, causing a pause of around 10ms" ~ https://github.com/php/php-src/blob/PHP-7.2.12/ext/standard/uniqid.c#L61 – x3ns Nov 20 '18 at 17:34
5

The relevant bit from the source code is

if (more_entropy) {
    uniqid = strpprintf(0, "%s%08x%05x%.8F", prefix, sec, usec, php_combined_lcg() * 10);
} else {
    uniqid = strpprintf(0, "%s%08x%05x", prefix, sec, usec);
}

So more_entropy adds nine somewhat random decimal digits (php_combined_lcg() returns a value in (0,1)) - that's 29.9 bits of entropy, tops (in reality probably less as LCG is not a cryptographically secure pseudorandom number generator).

Tgr
  • 27,442
  • 12
  • 81
  • 118
1

After reading the source code of uniqueId, it's clear the way it works is to convert the time in microseconds from 1970-01-01 00:00:00 into an ID. It also waits until a microsecond has passed.

That means in the following code:

$uniqueId = uniqid();
$uniqueId1 = uniqid();

You can be certain that $uniqueId != $uniqueId1, even without the more_entropy flag, as each ID will always be generated from a different microsecond.

If the ID's are generated on a different server or possibly even the same server but a different thread, then there is a chance the time in microseconds is the same, therefore uniqueid may not be unique. If this is the case, then you use the more_entropy flag for an extra 29.9 bits more of entropy. The chance of a collision would now be so highly improbably, that its probably not worth even checking to make sure the ID already exists.

If you are generating the ID's only on a single server without multithreading php, then there's no point using the more_entropy flag, otherwise use it. If you need a cryptographically secure ID, then you should use a decent 256 bit RNG instead.

Dan Bray
  • 7,242
  • 3
  • 52
  • 70
  • 1
    the best explanation on SO. Is it possible to get the same using javascript, i.e. - 13 characters length ID based on unix timestamp and containig numbers and letters ? – provance Dec 31 '22 at 10:01
  • @provance It is but I am not sure why you would want to. The problem is you can't trust data from the client side and without a request to the server using `Ajax` or `node.js`, there would be no way to guarantee uniqueness of the ID. What is it you're trying to achieve? – Dan Bray Dec 31 '22 at 10:47
0

If you want to generate a unique id, Try this one.

$a = time();
$b = date("Ymd");
$c = uniqid();
$d = $asec + $bsec;
$e = $sku;
$gen = $a.'_'.$b.'_'.$c.'_'.$d.'_'.$e;