Are the odds of a cryptographically secure random number generator generating the same uuid small enough that you do not need to check for uniqueness?

Question

I'm using this with a length of 20 for uuid. Is it common practice to not check if the uuid generated has not been used already if it's used for a persistent unique value?

Or is it best practice to verify it's not already being used by some part of your application if it's essential to retain uniqueness.

The short answer is "yes". 20 bytes = 160 bits. How many records do you think you'll accumulate over the life-time of your project? Punch it in here: http://davidjohnstone.net/pages/hash-collision-probability The odds are so slow its not worth the CPU cycles (nevermind the developer hours) — mpen, Mar 14 '18 at 01:08

score 2 · Answer 1 · answered Mar 14 '18 at 00:52

You can calculate the probability of a collision using this formula from Wikipedia::

where n(p; H) is the smallest number of samples you have to choose in order to find a collision with a probability of at least p, given H possible outputs with equal probability.

The same article also provides Python source code that you can use to calculate this value:

from math import log1p, sqrt

def birthday(probability_exponent, bits):
    probability = 10. ** probability_exponent
    outputs     =  2. ** bits
    return sqrt(2. * outputs * -log1p(-probability))

So if you're generating UUIDs with 20 bytes (160 bits) of random data, how sure can you be that there won't be any collisions? Let's suppose you want there to be a probability of less than one in a quintillion (10^–18) that a collision will occur:

>>> birthday(-18,160)
1709679290002018.5

This means that after generating about 1.7 quadrillion UUIDs with 20 bytes of random data each, there is only a one in 1 a quintillion chance that two of these UUIDs will be the same.

Basically, 20 bytes is perfectly adequate.

AutoBootDisk · Answer 2 · 2018-03-14T22:13:43.650

1

crypto.RandomBytes is safe enough for most applications. If you want it to by completely secure, use a length of 16. Once there is a length of 16 there will likely never be a collision in the nearest century. And it is definitely not a good idea to check an entire database for any duplicates, because the odds are so low that the performance debuff outweighs the security.

edited Mar 14 '18 at 22:13

answered Mar 14 '18 at 00:16

AutoBootDisk

89
1
14

1

I think 16 bytes (128 bits) is enough actually. Even with a trillion records, the odds are about 0.00000000000014432899%. – mpen Mar 14 '18 at 01:06

Are the odds of a cryptographically secure random number generator generating the same uuid small enough that you do not need to check for uniqueness?

2 Answers2

Linked