2

I'm using this with a length of 20 for uuid. Is it common practice to not check if the uuid generated has not been used already if it's used for a persistent unique value?

Or is it best practice to verify it's not already being used by some part of your application if it's essential to retain uniqueness.

tom
  • 63
  • 1
  • 7
  • The short answer is "yes". 20 bytes = 160 bits. How many records do you think you'll accumulate over the life-time of your project? Punch it in here: http://davidjohnstone.net/pages/hash-collision-probability The odds are so slow its not worth the CPU cycles (nevermind the developer hours) – mpen Mar 14 '18 at 01:08

2 Answers2

2

You can calculate the probability of a collision using this formula from Wikipedia::

     n(p;H) ≈ √{2H ln[1/(1-p)]}

where n(p; H) is the smallest number of samples you have to choose in order to find a collision with a probability of at least p, given H possible outputs with equal probability.

The same article also provides Python source code that you can use to calculate this value:

from math import log1p, sqrt

def birthday(probability_exponent, bits):
    probability = 10. ** probability_exponent
    outputs     =  2. ** bits
    return sqrt(2. * outputs * -log1p(-probability))

So if you're generating UUIDs with 20 bytes (160 bits) of random data, how sure can you be that there won't be any collisions? Let's suppose you want there to be a probability of less than one in a quintillion (10–18) that a collision will occur:

>>> birthday(-18,160)
1709679290002018.5

This means that after generating about 1.7 quadrillion UUIDs with 20 bytes of random data each, there is only a one in 1 a quintillion chance that two of these UUIDs will be the same.

Basically, 20 bytes is perfectly adequate.

r3mainer
  • 23,981
  • 3
  • 51
  • 88
1

crypto.RandomBytes is safe enough for most applications. If you want it to by completely secure, use a length of 16. Once there is a length of 16 there will likely never be a collision in the nearest century. And it is definitely not a good idea to check an entire database for any duplicates, because the odds are so low that the performance debuff outweighs the security.

AutoBootDisk
  • 89
  • 1
  • 14
  • 1
    I think 16 bytes (128 bits) is enough actually. Even with a trillion records, the odds are about 0.00000000000014432899%. – mpen Mar 14 '18 at 01:06