Which UUID version to use?

Question

Which version of the UUID should you use? I saw a lot of threads explaining what each version entails, but I am having trouble figuring out what's best for what applications.

Anything that works with python. So I guess this http://docs.python.org/2/library/uuid.html. 1,3,4,5. — user1802143, Dec 03 '13 at 03:06
If you are curious about Versions 3 & 5, see this Question, [Generating v5 UUID. What is name and namespace?](https://stackoverflow.com/q/10867405/642706). — Basil Bourque, Jun 08 '17 at 06:01

score 578 · Answer 1 · edited Apr 01 '22 at 15:56

578

There are two different ways of generating a UUID.

If you just need a unique ID, you want a version 1 or version 4.

Version 1: This generates a unique ID based on a network card MAC address and current time. If any of these things is sensitive in any way, don't use this. The advantage of this version is that, while looking at a list of UUIDs generated by machines you trust, you can easily know whether many UUIDs got generated by the same machine, or infer some time relationship between them.
Version 4: These are generated from random (or pseudo-random) numbers. If you just need to generate a UUID, this is probably what you want. The advantage of this version is that when you're debugging and looking at a long list of information matched with UUIDs, it's quicker to spot matches.

If you need to generate reproducible UUIDs from given names, you want a version 3 or version 5. If you are interacting with other systems, this choice was already made and you should check with version and namespaces they use.

Version 3: This generates a unique ID from an MD5 hash of a namespace and name. If are dealing with very strict resource requirements (e.g. a very busy Arduino board), use this.
Version 5: This generates a unique ID from an SHA-1 hash of a namespace and name. This is the more secure and generally recommended version.

edited Apr 01 '22 at 15:56

Liz Av

2,864
1
25
35

answered Dec 03 '13 at 03:37

Gabe

84,912
12
139
238

30

I would add: If you need to generate a `reproducible` UUID from a given name, you want a version 3 or version 5. If you feed that algorithm the same input, it will generate the same output. – anregen Oct 15 '14 at 16:04
1

What if one wanted a sortable (time based) UUID? - It seems the answer is V1, but it may depend how you generate it so that it can be stored optimally. Source: https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/ – Tit Petric Jul 26 '16 at 08:27
4

In a cloud computing environment (such as AWS or GAE), it would seem the weakness of Version 1 is mitigated into oblivion. Where there are likely to be thousands of different MAC addresses applied to a given application's UUID generator over time, eliminating predictability and/or traceability. – Buffalo Rabor Nov 03 '16 at 18:48
Given that SHA1 is now broken, UUIDv5 cannot be preferred anymore – user239558 Mar 17 '17 at 09:12
3

@user239558 Given the goal for an UUID is its uniqueness, UUIDv5 can still be preferred. – Epicurist Apr 02 '17 at 09:22
8

That comment about Version 1 being "not recommended", is overly simplistic. In many situations, these are indeed fine and preferable. But if you have security concerns about leaking either of these items of information from a UUID that might be made available to untrustworthy actors: (a) the MAC address of the machine creating the UUID, or (b) the date-time when created, then avoid Version 1. If those two pieces of information are *not* sensitive, then Version 1 is an excellent way to go. – Basil Bourque Jun 03 '17 at 23:06
29

What happened to version 2? – Matthew Woo Nov 16 '17 at 08:44
5

Version 1 UUID can be perfectly suitable if one wants a 0 (not infinitesimal but actually 0) probability of collision. In any case, UUID should not be used for security purposes, as stated by RFC4122: https://tools.ietf.org/html/rfc4122#section-6 – Deimos Apr 03 '18 at 12:08
2

@MatthewWoo It's not widely used for anything outside of the Open Source Foundation's (OSF) Distributed Computing Environment (DCE). According to the RFC, version 2 is a "DCE Security version, with embedded POSIX UIDs". For all practical purposes and interoperability between most systems in use, you'll use one of the other 4 versions of UUID as specified in RFC 4122. – fourpastmidnight Jan 28 '20 at 17:17
@MatthewWoo to expand on the comment above, the clock-tick rate of Version 2 is very slow, limiting it to one unique ID every 7 seconds (approximately). The benefit is that it can encode a much larger range of values per node ID (such as the MAC address). It's reserved for things like user IDs (POSIX UID) because you aren't typically generating them at a high frequency. – McGuireV10 Jan 30 '20 at 09:01
3

@Deimos There is an upper limit of 2^128 possible UUIDs considering any possible scheme you envisage. So the probability of collision with (2^128+1) distinct inputs is in fact 1 irrespective of which version we use. – Jus12 Apr 22 '20 at 11:10
2

@Jus12 Absolutely. Not sure what I meant by that. The idea was, I think, that v1 uses the MAC address so two generators running with two different MAC addresses will never collide. However, there is only a finite number of UUIDs of any type, so any generator will collide with itself eventually. – Deimos Apr 22 '20 at 15:09

score 66 · Answer 2 · edited Oct 07 '21 at 05:46

66

If you want a random number, use a random number library. If you want a unique identifier with effectively 0.00...many more 0s here...001% chance of collision, you should use UUIDv1. See Nick's post for UUIDv3 and v5.

UUIDv1 is NOT secure. It isn't meant to be. It is meant to be UNIQUE, not un-guessable. UUIDv1 uses the current timestamp, plus a machine identifier, plus some random-ish stuff to make a number that will never be generated by that algorithm again. This is appropriate for a transaction ID (even if everyone is doing millions of transactions/s).

To be honest, I don't understand why UUIDv4 exists... from reading RFC4122, it looks like that version does NOT eliminate possibility of collisions. It is just a random number generator. If that is true, than you have a very GOOD chance of two machines in the world eventually creating the same "UUID"v4 (quotes because there isn't a mechanism for guaranteeing U.niversal U.niqueness). In that situation, I don't think that algorithm belongs in a RFC describing methods for generating unique values. It would belong in a RFC about generating randomness. For a set of random numbers:

chance_of_collision = 1 - (set_size! / (set_size - tries)!) / (set_size ^ tries)

edited Oct 07 '21 at 05:46

Community

1
1

answered Oct 15 '14 at 15:55

anregen

1,578
1
13
13

90

You will not see two UUID version 4 implementations collide, unless [you generate a billion UUIDs every second for a century *and* win a coin flip](https://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates). Remember, `set_size` is 2^122, which is *very big*. – Kevin Aug 18 '15 at 01:17
15

V4 algorithm isn't serial, meaning there is a chance that the first two UUIDs generated by v4 could match. Just because there are many options, does not mean you have to run out of unique options before you'll generate a repeat. That could happen at any time. – anregen Aug 19 '15 at 03:29
5

2^122 is only "big" in the context of one laptop, today. It gets unusably small when you start considering how many machines could use that algorithm to generate transactional ids for every packet in a totally connected IoT environment in 25 years. The upshot is, if you want unique, use an algorithm that guarantees it, not one that gives you a good chance in a limited context. – anregen Aug 19 '15 at 03:29
10

You are failing to actually do the math. We (as a species) are not generating 1 billion UUIDs every second. So we have *longer* than 100 years until the first collision (on average). – Kevin Aug 19 '15 at 03:42
1

@anregen I would add that there is now more complexity to the consideration as mobile devices move away from a consistent MAC address to a more random approach. Those algorithms that depend on MAC being unique may be subject to risk over time (likely big numbers). Our move to IOT and Mobile is now taking us back to "privacy" on the mobile side. While this is good for anonymity it does add a wrinkle to the "Guaranteed Unique" statement of UUID v1. The story on UUID v4 remains consistent. – Zack Jannsen Feb 04 '16 at 13:00
50

V4 "might" collide, but the probability is exceptionally low that for most use-cases its worth the risk. Re: "two machines in the world eventually creating the same 'UUID'v4", well, sure, but this isn't a problem because most machines in the world that use UUID's use them in different contexts. I mean, if I generate the same UUID for my own internal app as you do for your internal app, then it doesn't matter. Collisions only matter if they happen in the same context. (remember, even within an app, many UUID's don't have to be unique across the entire app, just the context they're used in) – Jun 17 '16 at 09:54
2

For example, if I have two entities: Users and Messages. Unless you want all entities of all types to be uniquely identifiable (but most system I've worked on do not need or do this), then Users.uuid and Messages.uuid can collide without problems, because they're in different contexts. – Jun 17 '16 at 09:57
10

So it sounds like, if you don't need your Guid to be secure, use version 1. If you need it secure, and feel lucky (or really, don't feel unlucky) use version 4. – Vaccano Jul 08 '16 at 19:57
If you're working within the context of your own app, use ++ operator. The reason to use a UUID implementation is when you must cooperate with other mechanisms and ensure the group of you don't collide. Preventing collisions with yourself should be trivial. – anregen Nov 29 '18 at 23:39
1

If you generate two Version 1 IDs on the same machine at the same time, the chance of them being the same is quite likely (ballpark one in ten million). With Version 4 it's pretty much impossible to get the same ID twice with no caveats at all (assuming your random number generator is good. Which it should be since so many other things rely on that in a modern operating system). – Abhi Beckert Oct 21 '20 at 06:25
3

@AbhiBeckert UUIDv1 already handles deconfliction for this proposed situation. From the wikipedia page "A 13- or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases [...] where there are multiple processors and UUID generators per node". So even in this scenario, UUIDv1 is guaranteed to generate unique IDs (UUIDv4 is not). – anregen Dec 29 '20 at 17:52
1

Late here, but the comment about using auto-incrementing integers within your own context, suggesting that UUIDs are only for cooperating with other mechanisms that you can't control and still preventing collisions, is wrong. Auto-incrementing integers leak business intelligence, just as a UUID v1 leaks MAC addresses and time information. If you're worried about leaking business intelligence (such as data velocity), then you at least shouldn't *expose* your auto-incrementing IDs. You should expose UUIDs instead, even if you keep the integers around for joins / indexing (this is why v4 exists) – Alexander Guyer Jun 02 '23 at 23:23

Nik Bougalis · Answer 3 · 2013-12-12T05:15:23.943

That's a very general question. One answer is: "it depends what kind of UUID you wish to generate". But a better one is this: "Well, before I answer, can you tell us why you need to code up your own UUID generation algorithm instead of calling the UUID generation functionality that most modern operating systems provide?"

Doing that is easier and safer, and since you probably don't need to generate your own, why bother coding up an implementation? In that case, the answer becomes use whatever your O/S, programming language or framework provides. For example, in Windows, there is CoCreateGuid or UuidCreate or one of the various wrappers available from the numerous frameworks in use. In Linux there is uuid_generate.

If you, for some reason, absolutely need to generate your own, then at least have the good sense to stay away from generating v1 and v2 UUIDs. It's tricky to get those right. Stick, instead, to v3, v4 or v5 UUIDs.

Update: In a comment, you mention that you are using Python and link to this. Looking through the interface provided, the easiest option for you would be to generate a v4 UUID (that is, one created from random data) by calling uuid.uuid4().

If you have some data that you need to (or can) hash to generate a UUID from, then you can use either v3 (which relies on MD5) or v5 (which relies on SHA1). Generating a v3 or v5 UUID is simple: first pick the UUID type you want to generate (you should probably choose v5) and then pick the appropriate namespace and call the function with the data you want to use to generate the UUID from. For example, if you are hashing a URL you would use NAMESPACE_URL:

uuid.uuid3(uuid.NAMESPACE_URL, 'https://ripple.com')

Please note that this UUID will be different than the v5 UUID for the same URL, which is generated like this:

uuid.uuid5(uuid.NAMESPACE_URL, 'https://ripple.com')

A nice property of v3 and v5 URLs is that they should be interoperable between implementations. In other words, if two different systems are using an implementation that complies with RFC4122, they will (or at least should) both generate the same UUID if all other things are equal (i.e. generating the same version UUID, with the same namespace and the same data). This property can be very helpful in some situations (especially in content-addressible storage scenarios), but perhaps not in your particular case.

I would guess it is because OP did not ask: how do I "code up [my] own UUID generation algorithm instead of calling the UUID generation functionality that most modern operating systems provide?" — anregen, Oct 15 '14 at 15:57
Aside from that, I think it is a good explanation of UUIDv3 and v5. See my answer below about why I think v1 can be a good choice. — anregen, Oct 15 '14 at 15:58
what is NAMESPACE_URL ? it's a variable i can get ? from where? — stackdave, Oct 21 '17 at 17:35
@stackdave `NAMESPACE_URL` is a UUID usually equal to `6ba7b811-9dad-11d1-80b4-00c04fd430c8`, following the recommendation made on page 30 of [RFC-4122](https://www.ietf.org/rfc/rfc4122.txt). — Jamie Ridding, Aug 31 '19 at 15:23
`sha256.update(something.getBytes(charset)); sha256.update(somethingElse.getBytes(charset)); byte[] hash = sha256.digest(salt); return UUID.nameUUIDFromBytes(hash).toString();` Is this v3? Do they generate the same UUID ? RFC4122 ? — Mohan Radhakrishnan, Jan 05 '22 at 06:11

score 4 · Answer 4 · answered Dec 02 '22 at 07:06

Version 1: UUIDs using a timestamp and monotonic counter.
Version 3: UUIDs based on the MD5 hash of some data.
Version 4: UUIDs with random data.
Version 5: UUIDs based on the SHA1 hash of some data.
Version 6: UUIDs using a timestamp and monotonic counter.
Version 7: UUIDs using a Unix timestamp.
Version 8: UUIDs using user-defined data.

Which UUID version to use?

6 Answers6

Linked

Related