I created a function in php that generates a hash from a number (id), and I need to check that there will be no collision (two or more ids have the same hash). Which function I can use to verify that there will be no collision in the nexts 99999999 ids? Thanks!
-
6A loop?......... – Oliver Charlesworth Mar 18 '12 at 15:02
-
Every hash function has collisions. If you want the probability of two random items having the same hash you need math. – Jochen Ritzel Mar 18 '12 at 15:13
-
@JochenRitzel: People always seem to assume that, for some reason, but it's not necessarily true. See the question ["Perfect Hash Function for Human Readable Order Codes"](http://stackoverflow.com/q/9551091/978917). (There have to be collisions, of course, if the number of legal inputs is greater than the number of legal outputs; but if the OP only needs uniqueness between 0 and 99,999,999, then that's unlikely to be the case.) – ruakh Mar 18 '12 at 15:24
-
As they have said, that is not a php problem, but a maths problem – aurbano Mar 18 '12 at 15:38
2 Answers
If your hash function works as supposed, and always generates the same output for the same input. And your inputs are restricted to 99999999 numbers, you could simply generate the hashes for those numbers and verify that there are no duplicates.
Although the nice solution would be to demonstrate mathematically that your hash function will produce unique results for those numbers.

- 3,324
- 1
- 25
- 39
If the hash can be totally random, try using the current timestamp in it as additional randomizer. For example:
$hash = sha1(microtime() * rand(1, 9999));
The chances of a duplicate coming out there is rather slim. Additionally, try setting the database field to be a UNIQUE
field, ensuring a duplicate INSERT is impossibe. Then, to make things complete, you can create a loop that tries until it succeeds, like so:
// SHA1 values shouldn't need escaping, but it doesn't really hurt to be extra sure :)
$query = "INSERT INTO `table` (`hash`) VALUES('" . mysql_real_escape_string($hash) . "')";
// Let's try the insert with a max of 10 random hashes
$tries = 10;
while(mysql_query($query) !== true) {
if($tries <= 0) {
break; // Something is really failing, stop trying!
}
// If this point is reached, apparantly a duplicate was created. Try again.
$hash = sha1(microtime() * rand(1, 9999));
// Decrement the tries counter.
$tries--;
}

- 34,211
- 7
- 53
- 66
-
If you generate the hash that way, you will simply find the sha1 digest of a random number. He wants to develop a function that will create a message digest from a number, similarly to the way sha1 works. And he is asking how he can prove it is unique for a finite range (I think) – aurbano Mar 18 '12 at 18:50
-
A hash must produce the same result every time the hash function is used. Adding a (bad) pseudo-random seed based on time will break that! – e-sushi Aug 14 '13 at 16:35