2

In my application, I want to convert the file URIs (/Users/<>/... or C://...) to a unique identifier. The file URIs are the external user input and the generated UUID would be the key (to store some data into the DB).

(Having a UUID was not mandatory. I wanted to convert variable length string to something manageable).

In Java, to convert an arbitrary string to a UUID, I can use UUID.nameUUIDFromBytes. It uses MD-5 to generate the UUID.

Does the collision probability of this operation (random string -> UUID) the same as the collision probability of MD5 itself? (process 2^64 inputs to get a 50% possibility)

Or, does converting the input to a UUID increases the collision probability?

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
Thiyagu
  • 17,362
  • 5
  • 42
  • 79
  • Not all bits of the UUID will be random (that's basically true for all correctly generated UUID), so you won't have the full 128 bits of randomness, but only something around 120. Other than that, yes the collisions are about as likely as they are with MD5 (which is to say: worryingly likely and almost guaranteed if you have any malicious actors that can control the input). Type 5 would at least switch to SHA-1 which is better (and still not ideal). – Joachim Sauer Nov 22 '19 at 11:04
  • @JoachimSauer hmm. Yes, I can see that [it](http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/UUID.java#l170) twiddles with a couple of bits to set the version (type 3) and variant. In my application, I do not expect malicious actors and I expect only valid file URIs – Thiyagu Nov 22 '19 at 11:19

1 Answers1

1

For UUIDv3, the library just creates a standard 128-bit MD5 hash and then replaces six of the bits with fixed values so the results cannot collide with those of other UUID algorithms.

Collision probability depends on how many bits of randomness you have, so in theory, UUIDv3 values will collide slightly more often than raw MD5 hashes. In practice, it just doesn't matter; both have so many bits that the odds are astronomical regardless.

StephenS
  • 1,813
  • 13
  • 19