73

Given a set of 100 different strings of equal length, how can you quantify the probability that a SHA1 digest collision for the strings is unlikely... ?

johannes
  • 7,262
  • 5
  • 38
  • 57
eastafri
  • 2,186
  • 2
  • 23
  • 34

3 Answers3

148

alt text

Are the 160 bit hash values generated by SHA-1 large enough to ensure the fingerprint of every block is unique? Assuming random hash values with a uniform distribution, a collection of n different data blocks and a hash function that generates b bits, the probability p that there will be one or more collisions is bounded by the number of pairs of blocks multiplied by the probability that a given pair will collide.

(source : http://bitcache.org/faq/hash-collision-probabilities)

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Peter
  • 47,963
  • 46
  • 132
  • 181
  • 13
    In conclusion, the likelihood of a collision is in the order of 10^-45. That is very, *very* unlikely. – Paul Lammertsma Dec 08 '09 at 14:41
  • 4
    But SHA-1 is not uniform distribution, it could be bigger than this upper bound. I doubt that this equation is not correct. AS least the EQUAL. – Kamel Apr 11 '15 at 02:12
  • 1
    This answer doesn't take into account the chinese discovery in 2005 where they are able to produce collisions in 2^69 iterations rather than the 2^80 projected by brute force https://www.schneier.com/blog/archives/2005/02/sha1_broken.html – Djarid Apr 18 '16 at 14:09
  • 5
    @Djarid It's important not to confound accidental hash collision and adversarial collision hunting. The former is the probability that the hash of two items will collide, and follows the formula above (although, as noted by Kamel, the distribution is not perfectly uniform and thus the probability is likely higher). The latter is used to intentionally try to find collisions, and relies on discovering and exploiting weaknesses in the hash. Cryptographic hashes attempt to be robust against such attacks, but often they are overkill for simpler hashing applications (not transmitting secrets). – Pierre D Feb 19 '17 at 18:01
  • that formula is accurate when 2^b >> n^2 (and when 2^b very big). I know it is the mayority (if not all) the cases for sha1... but for the record! – MrIo Dec 08 '20 at 22:21
7

Well, the probability of a collision would be:

1 - ((2^160 - 1) / 2^160) * ((2^160 - 2) / 2^160) * ... * ((2^160 - 99) / 2^160)

Think of the probability of a collision of 2 items in a space of 10. The first item is unique with probability 100%. The second is unique with probability 9/10. So the probability of both being unique is 100% * 90%, and the probability of a collision is:

1 - (100% * 90%), or 1 - ((10 - 0) / 10) * ((10 - 1) / 10), or 1 - ((10 - 1) / 10)

It's pretty unlikely. You'd have to have many more strings for it to be a remote possibility.

Take a look at the table on this page on Wikipedia; just interpolate between the rows for 128 bits and 256 bits.

ivcubr
  • 1,988
  • 9
  • 20
  • 28
Anthony Mills
  • 8,676
  • 4
  • 32
  • 51
5

That's Birthday Problem - the article provides nice approximations that make it quite easy to estimate the probability. Actual probability will be very very very low - see this question for an example.

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979