Preface:
Neither MD5 nor SHA-x are suitable for password hashes; ignoring the fact MD5 is cryptographically broken and should be phased out in general, both hash families are too fast and are not suited for this problem - This is because the hashes are too fast, and humans often choose weak/poor password which severely compromises the domain, which makes brute-forcing MD5/SHA-x passwords practical.
For password hashing, use bcrypt instead (via crypt
), which is standard in any self-respecting PHP build. Other options, not supplied standard in PHP, are scrypt and even the more vulnerable PBKDF2.
The best thing to do is not re-invent the wheel but to use an existing/tested library which will use a valid hash algorithm (ie. bcrypt), and mitigate issues with incorrect database access, authentication testing, salt generation, etc. As of PHP 5.5+ the password_hash
and password_verify
functions can be used, although it is still only part of the system.
Response:
The output (range) of MD5 is 32 hex characters representing 128 bits; the input (domain) is effectively unbounded. For SHA-512 the values are 128 characters / 512 bits, respectively. (Note the number of bits of effective security considered is less than the actual number of bits in the range.)
Since neither MD5 nor SHA-x are perfect hash functions then there will be collisions over the domain even when it is less than the range - but the range is so huge (and a valid cryto-hash has certain properties) that it just doesn't matter.
That is, the resulting hashes are not guaranteed unique, but rather (and especially with more bits) it is extremely improbable that a duplicate will be accidentally (or even purposefully) found. The expected collision rate can be estimated by the Birthday Problem; the Pigeon Hole principle also applies, but the chance of a duplicate is so infinitesimally close to 100% before this, such isn't very applicable.
However, this is not a problem in a proper design. This is because cryptographic hash functions are designed with certain properties (which also explains why MD5 is not suitable in such a role):
The ideal cryptographic hash function has four main properties:
- it is easy to compute the hash value for any given message
- it is infeasible to generate a message that has a given hash
- it is infeasible to modify a message without changing the hash
- it is infeasible to find two different messages with the same hash
Do keep in mind that as of yet there have been no [reported] found collisions on SHA-1/2/3. This is because they are still suitable cryptographic hash functions. (See duskwoof's comment about SHA-1.)
Also, consider that there are less atoms in the observable universe (10^80 or ~2^280) than the range of a SHA-512 hash! This is part of the reason why/how cryptographic hash functions are considered secure.