2

I was under the impression that the UUID spec required a guaranteed, true, globally unique result, not unique 99.99999999999% of the time, but truly 100% of the time. From the spec:

A UUID is 128 bits long, and can guarantee uniqueness across space and time.

It looks like java only support V3 and V4 of the UUID spec. V4 isn't truly unique. With the V3 implementation using nameUUIDFromBytes, the following results in duplicates, because the computer is too fast (edit: looping to 10 and called new Date().getTime() will produce duplicates because the computer loops faster than new Date().getTime() can produce a different value on each iteration):

String seed;
for (int i = 0; i < 10; i++) {
     seed = "<hostname>" + new Date().getTime();
    System.out.println(java.util.UUID.nameUUIDFromBytes(seed.getBytes()));
}

Am I mistaken in assuming that a UUID is 100% unique, and that it is only practically unique but not perfectly so? Is there anyway to do this in Java?

Matthew Moisen
  • 16,701
  • 27
  • 128
  • 231
  • Sure, compare the generated UUID with all the other UUIDs you've generated. Anything that's random is "practically unique", no? While it's wildly unlikely, by definition, random numbers could repeat, even right after one another. Add time, a large random number, hashing, etc. and it's random for almost any reasonable purpose. A numbers guy could probably poke giant holes in that, though. – Dave Newton Mar 25 '15 at 20:58
  • @DaveNewton This is for an application that runs once every 5 minutes, and inserts the records into a DB that won't enforce unique constraints on that particular column. – Matthew Moisen Mar 25 '15 at 20:59
  • 2
    You simply cant have a truely 'universally unique ID' in a computer with limited memory. Even with the 128 bits, you only have 2^128 possible unique combinations of bits. You would need an infinite amount of memory to have infinitely many unique values, because your datatype would need potentially infinite width. Making it practically univerally unique is good enough imho. – Mark W Mar 25 '15 at 21:02
  • Define "prefect"? Computers are fast but not that fast. If you generate 2^64 UUIDs then there is a reasonable chance that two will be the same, but you would need billions of computers to produce these. – Peter Lawrey Mar 25 '15 at 21:12
  • The limit of what a computer can produce isn't enforced by its memory, but by what the computer can search. The "infinite" part is obvious, but since it would also take infinite time, irrelevant. 2^128 is unique enough, I'm pretty sure (what's the trope, that's 2^50 UIDs for every star in the known universe, and 2^50 isn't that small itself :) – Dave Newton Mar 25 '15 at 21:13
  • your example will not produce duplicates during your entire lifetime. And "the computer is too fast" doesnt make any sense. – specializt Mar 25 '15 at 21:40
  • Why doesn't your database schema enforce uniqueness on that column? – user207421 Mar 25 '15 at 21:45
  • @specializt Looping to 10 and using `UUID.nameUUIDFromBytes(("constant string" + new Date().getTime()).getBytes())` produces duplicates because the computer loops faster than `new Date().getTime()` can produce a different value on each loop . – Matthew Moisen Mar 25 '15 at 22:33
  • @EJP The app I'm working on actually dumps to disk where it is picked up by a separate app down stream which eventually loads it into another table that enforces unique constraints; i.e., I don't know about the duplicate issue until its too late. – Matthew Moisen Mar 25 '15 at 22:37
  • Java itself supports more versions, it just doesn't have generators built in. You can include a library with a MAC-based generator. – chrylis -cautiouslyoptimistic- Mar 25 '15 at 22:59

3 Answers3

4

There are different methods of UUID generation. The kind you're using is behaving exactly as it should. You're using nameUUIDFromBytes, a "Static factory to retrieve a type 3 (name based) UUID based on the specified byte array."

This generates the same UUID if given the same name. As you've discovered, your loop is passing-in the same name every time, so you get the same UUID.

Have a look at Gabe's advice here: Which UUID version to use? He recommends you use V4, which as others have pointed out is good enough for any realistic use case.

Community
  • 1
  • 1
DavidS
  • 5,022
  • 2
  • 28
  • 55
2

Because your entropy is limited to your memory, you can never ensure a UUID is "guaranteed, true, globally unique result". However, 99.99999999999% is already pretty good.

If you want to ensure unique values in your database, you could use a simple integer that's incremented to be sure it's unique. If you want to use UUIDs and be really sure they're unique, you just have to check that upon creation. If there's a duplicate, just create another one until it's unique.

Duplicates can happen, but IIRC, part of them is created dependent on your current time, so if you're just creating one every 5 minutes, you should be safe.

kelunik
  • 6,750
  • 2
  • 41
  • 70
0

As others have pointed out, the type-4 UUID returned by UUID.randomUUID() is likely to be unique enough for any practical application. Cases where it's not are likely to be pathological: for example, rolling back a VM to a live snapshot, without restarting the Java process, so that the random-number generator goes back to an exact prior state.

By contrast, a type-3 or type-5 UUID is only as unique as what you put into it.

A type-1 UUID (time-based) should be very slightly "more" unique, under certain constraints. The Java platform does not include support for generating a type-1 UUID, but I've written code (possibly not published) to call a UUID generating library via JNI. It was 18 lines of C and 11 lines of Java.

david
  • 997
  • 6
  • 15