13

Summary: I'm developing a persistent Java web application, and I need to make sure that all resources I persist have globally unique identifiers to prevent duplicates.

The Fine Print:

  1. I'm not using an RDBMS, so I don't have any fancy sequence generators (such as the one provided by Oracle)
  2. I'd like it to be fast, preferably all in memory - I'd rather not have to open up a file and increment some value
  3. It needs to be thread safe (I'm anticipating that only one JVM at a time will need to generate IDs)
  4. There needs to be consistency across instantiations of the JVM. If the server shuts down and starts up, the ID generator shouldn't re-generate the same IDs it generated in previous instantiations (or at least the chance has to be really, really slim - I anticipate many millions of presisted resources)
  5. I have seen the examples in the EJB unique ID pattern article. They won't work for me (I'd rather not rely solely on System.currentTimeMillis() because we'll be persisting multiple resources per millisecond).
  6. I have looked at the answers proposed in this question. My concern about them is, what is the chance that I will get a duplicate ID over time? I'm intrigued by the suggestion to use java.util.UUID for a UUID, but again, the chances of a duplicate need to be infinitesimally small.
  7. I'm using JDK6
Community
  • 1
  • 1
Julie
  • 6,221
  • 3
  • 31
  • 37
  • Are you running multiple instances of the application on different machines? If you are, are you likely to start machines up in batches -- so that it's likely that multiple processes will start in the same millisecond? If an attacker finds a way to cause a UUID collision, will that compromise the security of your application? – Mike Samuel Mar 17 '12 at 16:09
  • (A) What volume of IDs will be generated? How fast? (How many per second/minute) (B) Yes, UUIDs were invented exactly for your purpose. – Basil Bourque Feb 03 '14 at 23:19

6 Answers6

33

Pretty sure UUIDs are "good enough". There are 340,282,366,920,938,463,463,374,607,431,770,000,000 UUIDs available.

http://www.wilybeagle.com/guid_store/guid_explain.htm

"To put these numbers into perspective, one's annual risk of being hit by a meteorite is estimated to be one chance in 17 billion, that means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs"

http://en.wikipedia.org/wiki/Universally_Unique_Identifier

Shawn Miller
  • 7,082
  • 6
  • 46
  • 54
  • Nice reference! So its safe to infer that if I use UUID.randomUUID() in my application, the chance of it generating the same UUID twice is infinitesimally small then...? – Julie Oct 10 '08 at 20:34
  • 1
    Well, just because there are so many possible values, doesn't necessarily mean that they wrote the algorithm well enough to get a good random distribution. Then again, the designers of the UUID class have probably put a lot more thought into it than I could in an afternoon! – Julie Oct 10 '08 at 20:38
  • yep, quite a bit of thought went into it... "Standardized by the Open Software Foundation (OSF) as part of the Distributed Computing Environment (DCE)" – Shawn Miller Oct 10 '08 at 20:40
  • 21
    While you're up at night worrying about that duplicate UUID, be sure to keep an eye open for that meteorite... ;) – Michael Burr Oct 10 '08 at 21:47
  • A [Version 1](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_1_.28MAC_address.29) UUID is preferable, using MAC address + current time + a random number. The java.util.UUID class bundled with Java does not generate Version 1 presumably because of security & privacy concerns. The Wikipedia page on UUID [Implementations](https://en.wikipedia.org/wiki/Universally_unique_identifier#Implementations) lists 2 libraries that generate Version 1. But entirely-random (v4) is generally good enough if generated with a "cryptographically strong" randomizer (as in the bundled class). – Basil Bourque Feb 03 '14 at 23:30
1

If it needs to be unique per PC: you could probably use (System.currentTimeMillis() << 4) | (staticCounter++ & 15) or something like that.

That would allow you to generate 16 per ms. If you need more, shift by 5 and and it with 31...

if it needs to be unique across multiple PCs, you should also combine in your primary network card's MAC address.

edit: to clarify

private static int staticCounter=0;
private final int nBits=4;
public long getUnique() {
    return (currentTimeMillis() << nBits) | (staticCounter++ & 2^nBits-1);
}

and change nBits to the square root of the largest number you should need to generate per ms.

It will eventually roll over. Probably 20 years or something with nBits at 4.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
Bill K
  • 62,186
  • 18
  • 105
  • 157
  • 1
    That's a clever way to go about it. I think I'll trust the UUID class, as @smiller has given me more confidence that it's "unique enough." – Julie Oct 10 '08 at 20:42
  • My company uses a system very similar to this for generating our "UUIDs". It works ok (I've never seen a duplicate). It seems really hacky though, and also it allows you to figure out where and when something was created. – rmeador Oct 10 '08 at 20:49
  • It is hacky in that I was limiting myself to a long. If you simply used two longs and appended a count to the currentTime in a synchronized method, it won't fail unless your clock changes. If you are worried about that, it would be trivial to fix too. – Bill K Oct 13 '08 at 17:12
1
public class UniqueID {
    private static long startTime = System.currentTimeMillis();
    private static long id;

    public static synchronized String getUniqueID() {
        return "id." + startTime + "." + id++;
    }
}
Dave Griffiths
  • 1,962
  • 19
  • 22
  • 1
    Nice and simple solution, yes. The UUID bit of code will probably be the most called in my application, by many threads at once, so the synchronization overhead/bottleneck may be too much. – Julie Oct 13 '08 at 14:23
  • 3
    Don't roll your own UUID code. There are a lot of subtle ways to get it wrong. This code will do bad things if multiple processes start in the same millisecond, such as when you start a batch of machines running the same task at the same time. – Mike Samuel Mar 17 '12 at 16:10
0

From memory the RMI remote packages contain a UUID generator. I don't know whether thats worth looking into.

When I've had to generate them I typically use a MD5 hashsum of the current date time, the user name and the IP address of the computer. Basically the idea is to take everything that you can find out about the computer/person and then generate a MD5 hash of this information.

It works really well and is incredibly fast (once you've initialised the MessageDigest for the first time).

Aidos
  • 2,753
  • 3
  • 27
  • 31
0

if you want to use a shorter and faster implementation that java UUID take a look at:

https://code.google.com/p/spf4j/source/browse/trunk/spf4j-core/src/main/java/org/spf4j/concurrent/UIDGenerator.java

see the implementation choices and limitations in the javadoc.

here is a unit test on how to use:

https://code.google.com/p/spf4j/source/browse/trunk/spf4j-core/src/test/java/org/spf4j/concurrent/UIDGeneratorTest.java

user2179737
  • 493
  • 3
  • 6
0

why not do like this

String id = Long.toString(System.currentTimeMillis()) + 
    (new Random()).nextInt(1000) + 
    (new Random()).nextInt(1000);
user unknown
  • 35,537
  • 11
  • 75
  • 121
kem
  • 11