8

Worst case, I have 180 million values in a cache(15 minute window before they go stale) and an MD5 has 2^128 values. What is my probability of a collision? or better yet, is there a web page somewhere to answer that question or a rough estimate thereof? That would rock so I know my chances.

Dean Hiller
  • 19,235
  • 25
  • 129
  • 212
  • 1
    Possible duplicate of [How many random elements before MD5 produces collisions?](http://stackoverflow.com/questions/201705/how-many-random-elements-before-md5-produces-collisions) – mrogers Jan 20 '17 at 18:01
  • I am curious did you end up using MD5 for your use case ? – Nilan Saha Oct 27 '21 at 19:00

1 Answers1

8

The probability is 1-m!/(mⁿ(m-n)!) where m = 2¹²⁸ and n = 180000000.

Running it through on-line Wolfram exceeds available computational time!

If you have SmallTalk installed locally, you can run this:

|m n p|

m := 2 raisedTo:128.
n := 180000000.
p := (1-(m factorial/((m raisedTo:n)*(m-n)factorial)))asFloat.

Transcript show:p printString;cr.

A search for the Birthday Problem brings up a Wikipedia page where they provide a table showing for 128 bits and 2.6×10¹⁰ hashes, the probability of a collision is 1 in 10¹⁸, so that’s for 140× the number of hashes than what you’re considering. So you know your odds are “worse” than this.

A good approximation if n ≪ m is 1-e-n2/2m, where if you plug in m and n above, you get 4.76×10⁻²³ or 1 in 2.10×10²² as the probability of a collision.

Even though the probability of a collision is very low, it is prudent in the FOOBAR case, say if there is an issue and the hashes accumulate for more than 15 minutes, to at least confirm what would happen in the event of a collision. This will also help if someone somehow injects duplicate hashes in order to try to compromise it.

Yimin Rong
  • 1,890
  • 4
  • 31
  • 48