4

I am using SecureRandom.urlsafe_base64(8) in order to create unique ids in my system that are URL safe.

I would like to know how to calculate the probability of collision? I am inserting about 10.000 of those ids into an array, and I want to avoid checking if one of the keys is already in the array, but I also want to make sure that are not repeated? What are the chances?

Hommer Smith
  • 26,772
  • 56
  • 167
  • 296

3 Answers3

16

There is a good approximation of this probability (which relates to the birthday problem). If there are k potential values and n are sampled, the probability of collision is:

k! / (k^n * (k - n)!)

The base64 method returns a base 64 string built from the inputted number of random bytes, not that number of random digits. Eight random bytes gives us k = 256^8, about 1.8446744e+19. You are generating 10,000 of these strings, so n = 10,000, which gives us a probability of 2.710498492319857e-12, which is very low.

Kyle
  • 893
  • 1
  • 9
  • 20
Andrew
  • 1,839
  • 14
  • 25
7

You do not make something sure by calculation of a probability, you only know how likely it might happen.

To protect yourself, just add a unique index to the database column. That ensures that you cannot store duplicate entries in your database. With such a unique index, an insertion will raise an ActiveRecord::InvalidStatement error in case this very unlikely (see @Andrew's answer) ever happens.

spickermann
  • 100,941
  • 9
  • 101
  • 131
  • 1
    Very good point! If you need to be more than "probably" sure, the probability doesn't matter and you need validation. – Andrew Jul 10 '14 at 22:11
0

Slight adjustment to Andrew's answer, I believe the equation for probability of collision is:

1 - (k! / (k^n * (k - n)!))

Given that k is potential values and n the number of samples. The equation:

k! / (k^n * (k - n)!)

gives the probability that there is NOT a collision -- according to the birthday problem wiki.

You can sanity check this by trying a few different n values. More samples should naturally give a higher probability of collision.