13

What is the probability of md5 collision if I pass in 2^32 sets of string?

Can I say the answer is just 2^32/2^128 = 1/1.2621774e-29 as the length of bit of md5 hash is 128?

Harold Chan
  • 799
  • 3
  • 11
  • 31

1 Answers1

26

This question is similar to the so-called "birthday paradox".

In probability theory, the birthday problem or birthday paradox concerns the probability that, in a set of n randomly chosen people, some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367 (since there are 366 possible birthdays, including February 29). However, 99% probability is reached with just 57 people, and 50% probability with 23 people. These conclusions are based on the assumption that each day of the year (except February 29) is equally probable for a birthday.

The mathematics behind this problem led to a well-known cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of cracking a hash function.

According to the Wikipedia article, the chance of a collision when choosing n = 232 random numbers from a space with d = 2128 numbers is approximately:

Generalized birthday problem math

If you work this calculation out the chance is about 2.7×10-20. This is a very small probability, but notice that it is 9 orders of magnitude higher than your proposed calculation.

Community
  • 1
  • 1
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • 9
    And that is assuming that the md5 algrorithm has a perfectly even spread of hashes. Since it does not, your chances of collision are even greater. – neelsg Feb 20 '13 at 06:09