0

I am using numbers as input and length will be 10 characters.

EG. Input - 8429294732 Output - 96325be7

Total possibility of crc32 is 16^8 which is roughly 4.2Billion.

Does anyone know what is chance of collision here?

And can you please explain.

  • How can a 32 bits data store 16^8 distinct values ? –  May 17 '21 at 09:32
  • First 4.2 billion what? The first 4.2 billion random 10 character strings? There is almost 100% probability of at least one collision. – Stephen C May 17 '21 at 09:45

2 Answers2

1

See this answer.

Assuming you mean ten decimal digits of uniform, independent probability, then your inputs will result in on the order of 90% coverage of the possible 32-bit CRC values. We will use the formula with n = 0.9 232, and from the question title I will presume, k = 232. You will have about two billion collisions. If the inputs are chosen randomly, the chance of not having a collision is about 10-1036266998. Also known in practical terms as: zero.

(By the way, your "roughly 4.2 billion", should be roughly 4.3 billion.)

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • can you please elaborate about your assertion “90% coverage of the possible 32-bit CRC values”? The source has 10^10 possible symbols and the CRC, 4.3x10^9. Thanks! – guga May 18 '21 at 22:29
  • 1
    If you randomly throw 10^10 darts at a dart board with 2^32 spots (and they all hit the dart board), then about 90% of the spots will have been hit by a dart. – Mark Adler May 18 '21 at 23:54
  • On average, 90.254%. – Mark Adler May 19 '21 at 01:11
  • Ah! Pigeonhole principle! Got it! Thanks! – guga May 19 '21 at 21:26
0

Assuming your input uses the full range of the 10 digits, that would give you 10^10 different numbers, and you map this to 16^8=2^32 numbers generated by the CRC32. So, on average, you have (10^10)/(2^32)=2.3 numbers mapped to every possible CRC32 value generated. So, the answer to your question is that you have 100% chance of collision.

guga
  • 714
  • 1
  • 5
  • 15