2

I am generating 15 character alpha numeric codes and saving them as a MD5 hash for protection. However I cannot have non unique or colliding hashes and if they occur I do not insert them. Since I have a large number of codes that I will be inserting into database during lifetime of the app and to ensure better performance of my application I want to ensure the collision itself are less in number.

Question: What is the probability of Collisions given the input space is 36 raised to 15? (36 because I am using 26 lower case alphabets and 10 numbers and 15 because I am generating code with 15 of these).

You can refer here to understand how I am generating the codes in first place.

Usage: To use these in coupon codes and I want to hash them to protect myself from database being compromised.

Community
  • 1
  • 1
FBP
  • 345
  • 3
  • 15
  • `if they occur I do not insert them` is pretty easy, make the column unique. – chris85 Apr 06 '17 at 16:13
  • 2
    Why even bother generating the hash? Just use the original 15 character string as your unique key. – Alex Howansky Apr 06 '17 at 16:13
  • If you generate a 15 char random string then hash it are you not *subtracting security* because if you use the resultant value in string form you have a much smaller alphabet space (0-9A-F)? – Alex K. Apr 06 '17 at 16:13
  • From Wikipedia: In 1996, collisions were found in the compression function of MD5, and Hans Dobbertin wrote in the RSA Laboratories technical newsletter, "The presented attack does not yet threaten practical applications of MD5, but it comes rather close ... in the future MD5 should no longer be implemented ... where a collision-resistant hash function is required." So you might consider using something else altogether. – NAMS Apr 06 '17 at 16:15
  • Seems relevant http://stackoverflow.com/a/2088983/ and probably what you're looking for. – Funk Forty Niner Apr 06 '17 at 16:21
  • 1st point, that offers less protection than what you might think. Reversing the md5 of a 15 character message is pretty quick nowadays because Google has basically indexed a lot of md5s of short strings. – apokryfos Apr 06 '17 at 16:28
  • Can you use the uniqid function? (http://php.net/manual/en/function.uniqid.php) this uses a time stamp so that ID's are unique based on when they are generated. – MEmerson Apr 06 '17 at 16:36
  • @AlexHowansky Usage: To use these in coupon codes and I want to hash them to protect myself from database being compromised. – FBP Apr 06 '17 at 16:49
  • MD5 is fine for detecting changes and fingerprinting but insufficient for security use. Personally, I think that if your database is compromised, then some leaked coupon codes are the least of your worries -- but if you really insist on having them obfuscated, use `password_hash()` and treat them like passwords. – Alex Howansky Apr 06 '17 at 17:52

1 Answers1

2

The chances of generating a collision any collision of a secure hash are negligible, i.e. close to zero. That's even true for MD5, which is a broken secure hash. Even with a very large input (think 2^64) of hashes, the chances of generating a collision is still about 1/(2^64).

The possibility of your input having a collision is of course much higher (assuming that it is randomly generated), as 36^15 is much smaller than 2^128, the output size of MD5 (36^15 < (2^6)^15 = 2^90 <<< 2^128). So there are fewer input values than that there are hash values.

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
  • I had to lol a bit after reading *"**broken** secure hash"*. Perhaps you meant *"**broken** cryptographic hash"*. – Artjom B. Apr 06 '17 at 20:30
  • Yeah, well, "secure hash" is in this case just a type of algorithm and algorithms can be broken. It's a contradiction, but hey, life is abound with contradictions. Best thing is to indeed laugh at them :) – Maarten Bodewes Apr 06 '17 at 20:32
  • @MaartenBodewes did you mean "The possibility of your input having a collision is of course much lower" instead? – FBP Apr 07 '17 at 07:39