2

I know the collision rate of UUID is practically zero as a fact, but why is such a low collision rate guaranteed globally?

According to the RFC 4122,

The version 4 UUID is meant for generating UUIDs from truly-random or pseudo-random numbers.

and if you use PRNG, there is a seed. So there should be a case where two UUID generators (accidentally) share the same seed. In such a case, what happens?

ynn
  • 3,386
  • 2
  • 19
  • 42
  • Nothing special happens, you will simply have the same UUIDs generated twice. The same could happen with truly random input, you don't need pseudo-random generators or accidental seed-sharing for this to happen. It's just _very_ unlikely. Typically, systems will use PRNGs considered cryptographically safe. Those also acquire their initial seed in some way that is considered cryptographically safe. – He3lixxx Jul 05 '23 at 16:34
  • @He3lixxx Is that guaranteed by RFC? If I implement UUID generator which is seeded in a cryptographically unsafe way (e.g. just use the integer `0` instead of `/dev/random`), is it standard-compliant? If so, UUID can never be ***globally*** unique. Even we can't define collision rate in that case, I believe. (In other words, if all of the UUID implementations in the world are implemented using CSPRNG, UUID should be practically globally unique, and we can define collision rate.) – ynn Jul 06 '23 at 00:39
  • 1
    You can never be certain (as defined in probability theory) that a UUID4 you just generated is globally unique. This is true independently of the quality of your randomness. I don't think the RFC mandates secure random numbers. Under "Security Considerations", it considers bad sources of randomness: "A predictable random number source will exacerbate the situation". See also [this Software Engineering StackExchange answer](https://softwareengineering.stackexchange.com/a/130298). – He3lixxx Jul 06 '23 at 11:51
  • @He3lixxx Could you please consider posting your comment as an answer? It exactly answers my question. – ynn Jul 06 '23 at 12:41

1 Answers1

3

So there should be a case where two UUID generators (accidentally) share the same seed. In such a case, what happens?

Nothing, the same UUID4 will simply be generated multiple times. This does not even depend on "bad" randomness, as could be caused by a deterministic seed of a pseudo-random number generator. "Bad" randomness just makes this more probable, but collisions always have a non-zero probability.

If I implement UUID generator which is seeded in a cryptographically unsafe way (e.g. just use the integer 0 instead of /dev/random), is it standard-compliant?

Technically yes, the RFC does not specify requirements on the pseudo-random numbers used. Under 6. Security Considerations, it says (emphasis mine):

Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation.

which implies that a predictable random number source still fulfills the requirements of the RFC regarding pseudo-randomness.

The same sections also says:

Distributed applications generating UUIDs at a variety of hosts must be willing to rely on the random number source at all hosts. If this is not feasible, the namespace variant should be used.

which moves the responsibility of how random numbers are generated and how probable this makes collisions in the generated UUIDs away from the RFC, towards implementations. See also this Software Engineering StackExchange answer.

Personally, I'd expect implementations to use reasonably high-quality randomness. For example, Java's java.util.UUID.randomUUID and CPython's uuid.uuid4 use cryptographically secure randomness.

He3lixxx
  • 3,263
  • 1
  • 12
  • 31