27

PHP has a uniqid() function which generates a UUID of sorts.

In the usage examples, it shows the following:

$token = md5(uniqid());

But in the comments, someone says this:

Generating an MD5 from a unique ID is naive and reduces much of the value of unique IDs, as well as providing significant (attackable) stricture on the MD5 domain. That's a deeply broken thing to do. The correct approach is to use the unique ID on its own; it's already geared for non-collision.

Why is this true, if so? If an MD5 hash is (almost) unique for a unique ID, then what is wrong from md5'ing a uniqid?

Ferdinand Beyer
  • 64,979
  • 15
  • 154
  • 145
ryeguy
  • 65,519
  • 58
  • 198
  • 260

6 Answers6

46

A UUID is 128 bits wide and has uniqueness inherent to the way it is generated. A MD5 hash is 128 bits wide and doesn't guarantee uniquess, only a low probablity of collision. The MD5 hash is no smaller than the UUID so it doesn't help with storage.

If you know the hash is from a UUID it is much easier to attack because the domain of valid UUIDs is actually fairly predictable if you know anything about the machine geneerating them.

If you needed to provide a secure token then you would need to use a cryptographically secure random number generator.(1) UUIDs are not designed to be cryptographically secure, only guaranteed unique. A monotonically increasing sequence bounded by unique machine identifiers (typically a MAC) and time is still a perfectly valid UUID but highly predictable if you can reverse engineer a single UUID from the sequence of tokens.

  1. The defining characteristic of a cryptographically secure PRNG is that the result of a given iteration does not contain enough information to infer the value of the next iteration - i.e. there is some hidden state in the generator that is not revealed in the number and cannot be inferred by examining a sequence of numbers from the PRNG.

    If you get into number theory you can find ways to guess the internal state of some PRNGs from a sequence of generated values. Mersenne Twister is an example of such a generator. It has hidden state that it used to get its long period but it is not cryptographically secure - you can take a fairly small sequence of numbers and use that to infer the internal state. Once you've done this you can use it to attack a cryptographic mechanism that depends on keeping that sequence a secret.
ConcernedOfTunbridgeWells
  • 64,444
  • 15
  • 143
  • 197
  • Why UUID considered 128bit, when it has 30 characters not 32 that are base 16? Digit 13 is base 5 (1-5) and digit 17 is base 4 (8,9,A,B). Theoretically this should increase collision probability compare to true 128bit MD5... – vanowm Mar 06 '21 at 01:14
14

Note that uniqid() does not return a UUID, but a "unique" string based on the current time:

$ php -r 'echo uniqid("prefix_", true);'
prefix_4a8aaada61b0f0.86531181

If you do that multiple times, you will get very similar output strings and everyone who is familiar with uniqid() will recognize the source algorithm. That way it is pretty easy to predict the next IDs that will be generated.

The advantage of md5()-ing the output, along with an application-specific salt string or random number, is a way harder to guess string:

$ php -r 'echo md5(uniqid("prefix_", true));'
3dbb5221b203888fc0f41f5ef960f51b

Unlike plain uniqid(), this produces very different outputs every microsecond. Furthermore it does not reveil your "prefix salt" string, nor that you are using uniqid() under the hood. Without knowing the salt, it is very hard (consider it impossible) to guess the next ID.

In summary, I would disagree with the commentor's opinion and would always prefer the md5()-ed output over plain uniqid().

Ferdinand Beyer
  • 64,979
  • 15
  • 154
  • 145
  • 4
    If you need your IDs to be unguessable, taking an easily guessable input and obfuscating it is _not_ the way to go. – Nick Johnson Aug 18 '09 at 15:53
  • How is the input guessable if you do not expose the salt string (Prefix used for uniqid())? Can you please explain your criticism? – Ferdinand Beyer Aug 19 '09 at 06:59
  • 2
    There are perfectly good ways to generate really unguessable IDs. By merely obfuscating an easily predicted sequence, you're relying on nobody figuring out the method and your salt string. If they do, they can easily predict IDs you'll generate in future. – Nick Johnson Aug 19 '09 at 08:07
  • 3
    With the same rationale you could argue that every passphrase-based encryption algorithm is weak since you rely on nobody figuring out the password. Nevertheless, the quesion is about using MD5 with PHP's uniqid() function, not about the best way to generate unguessable unique IDs. – Ferdinand Beyer Aug 19 '09 at 08:17
  • 2
    Salt must be random data. Using the same “prefix_” merely obfuscates, it is not secure. – Chris Page May 03 '17 at 21:20
5

MD5ing a UUID is pointless because UUIDs are already unique and fixed length (short), properties that are some of the reasons that people often use MD5 to begin with. So I suppose it depends on what you plan on doing with the UUID, but in general a UUID has the same properties as some data that has been MD5'd, so why do both?

Adam Batkin
  • 51,711
  • 9
  • 123
  • 115
2

UUIDs are already unique, so there is no point in MD5'ing them anyway.

Regarding the security question, in general you can be attacked if the attacker can predict what the next unique ID will be you are about to generate. If it is known that you generate your unique IDs from UUIDs, the set of potential next unique IDs is much smaller, giving a better chance for a brute force attack.

This is especially true if the attacker can get a whole bunch of unique IDs from you, and that way guess your scheme of generating UUIDs.

Zed
  • 57,028
  • 9
  • 76
  • 100
  • “there is no point”: In fact, it's worse than no point, because UUIDs are unique, whereas MD5 hashes of UUIDs are not. – Chris Page May 03 '17 at 21:25
2

Version 3 of UUIDs are already MD5'd, so there's no point in doing it again. However, I'm not sure what UUID version PHP uses.

cdmckay
  • 31,832
  • 25
  • 83
  • 114
0

As an aside, MD5 is actually obsolete and is not to be used in anything worth protecting - PHI, PII or PCI - from 2010 onwards. The US Feds have ennforced this and any entity non-compliant would be paying lots of $$$ in penalty.

Jarvis Bot
  • 481
  • 6
  • 15
  • Yes. Software which uses MD5 for security will officially not be considered past 2010, but realistically, nobody wants it now because they'll be required to get rid of it later (for a very soon value of later). So if you are selling software or software as a service, you will rule out some customers merely through the use of MD5. Use SHAx, preferably with a large value of x. – quillbreaker Aug 18 '09 at 14:45
  • 1
    As far as I know, MD5 has never been included in NIST standard. What is being phased out by NIST is SHA-1 and basically everything that has 80-bit or less security. Also companies not implementing NIST standards don't get fined. They just can't get NIST certification, and hence lose customers that require such a certification. – Accipitridae Aug 18 '09 at 18:23