6

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don't want the time of creation to be visible or use a purely numeric value so I am using:

sha1(uniqid(mt_rand(), true))

Is it wrong to use a hash for a unique ID? Don't all hashes lead to collisions or are the chances so remote that they should not be considered in this case?

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

texelate
  • 2,460
  • 3
  • 24
  • 32
  • 3
    no, not wrong. use pass the timestamps to sha1 will be good. – wayne May 03 '13 at 05:34
  • 1
    this might be a good read:: http://stackoverflow.com/questions/2768191/hash-of-unique-value-unique-hash – Sudhir Bastakoti May 03 '13 at 05:37
  • Why create your own unique ID? Does your database not support auto-increment primary keys? That's generally accepted as the way to have unique IDs on database identifiers. – Patrick M May 03 '13 at 06:06
  • @Patrick M I don't want the IDs to have any meaning. – texelate May 03 '13 at 06:56
  • @texelate that is a very strange requirement :-) Is it really so bad to have the insertion order tracked? I understand not wanting the structure or code to imply unintended significance, but the Primary Key is really common; anyone using a RDBS should know exactly what it means. The random ID has two major drawbacks: 1) It will collide at unpredictable intervals which you have to handle (where the PK will overflow at an exact, known point). 2) Having a meaningless key adds nothing but complexity to the system. Using a deterministic hash key is better than a random one. – Patrick M May 03 '13 at 14:30
  • 1
    One reason for doing this is to prevent hackers from finding an http GET endpoint in your application and simply using those sequential auto-incremented keys to take a stroll through your database. It rather famously happened to AT&T a number of years ago, sorry, can't find a link to it, but I remember it every time I design a system. So turning endpoint /customer/5 into /customer/dhftrt5sahjgertgjwygad is a definite advantage - it would be *really* embarrassing to be hacked so simply otherwise. – saswanb Aug 03 '18 at 15:56

5 Answers5

8

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. 'Best case' because the input usually will be ASCII which doesn't utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.

To answer your final question:

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won't it always be unique?

Yeah that's true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.

As @wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

Morten Jensen
  • 5,818
  • 3
  • 43
  • 55
  • 2
    Thanks, would that be: sha1(base64_encode(microtime(). uniqid(mt_rand(), true))) – texelate May 03 '13 at 06:50
  • 1
    @texelate that would be a fine starting point. See this page point especially 1+2 for more info on salting: http://blog.ircmaxell.com/2012/12/seven-ways-to-screw-up-bcrypt.html – Morten Jensen May 03 '13 at 07:04
  • Thanks. I need this for uniqueness rather than security but a very useful read all the same. – texelate May 03 '13 at 09:11
3

How horrible would it be if two ended up the same? Murphy's Law applies - if a million to one, or even a 100,000:1 chance is acceptable, then go right ahead! The real chance is much, much smaller - but if your system will explode if it happens then your design flaw must be addressed first. Then proceed with confidence.

Here is a question/answer of what the probabilities really are: Probability of SHA1 Collisions

Community
  • 1
  • 1
BrianH
  • 2,140
  • 14
  • 20
3

Use sha1(time()) in stead, then you remove the random possibility of a repeating hash for as long as time can be represented shorter than the sha1 hash. (likely longer than you fill find a working php parser ;))

Sven Tore
  • 967
  • 6
  • 29
  • 6
    `then you remove the random possibility of a repeating hash` .. unless you instantiate more than 1 object pr. second - which is not a lot IMHO. I'd recommend against just using the timestamp as input for a unique key – Morten Jensen May 05 '13 at 20:17
2

Computer random isn't actually random, you know? The only true random that you can obtain from a computer, supposing you are on a Unix environment is from /dev/random, but this is a blocking operation that depends on user interactions like moving a mouse or typing on keyboard. Reading from /dev/urandom is less safe, but it's probably better thang using just ASCII characters and gives you instantaneous response.

Henrique Barcelos
  • 7,670
  • 1
  • 41
  • 66
0

sha1($ipAddress.time()) Causes it's impossible for anyone to use same IP address same time

Liam Ethan
  • 11
  • 2