105

Why are UUID's presented in the format "8-4-4-4-12" (digits)? I've had a look around for the reason but can't find the decision that calls for it.

Example of UUID formatted as hex string: 58D5E212-165B-4CA0-909B-C86B9CEE0111

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Fidel
  • 7,027
  • 11
  • 57
  • 81
  • 15
    Actually, that hex string example is incorrect. The [UUID spec requires](https://tools.ietf.org/html/rfc4122#page-3) that the **hex string representing a UUID value *must* be in *lowercase***. The spec also requires an implementation to be able to parse an uppercase or mixed-case string, but only lowercase can be generated. Unfortunately common implementations violate this rule including those by Apple, Microsoft, and others. – Basil Bourque Dec 03 '18 at 21:48
  • 1
    Interesting Basil, thanks – Fidel Dec 04 '18 at 01:05

3 Answers3

84

It's separated by time, version, clock_seq_hi, clock_seq_lo, node, as indicated in the following rfc.

From the IETF RFC4122:

4.1.2.  Layout and Byte Order

   To minimize confusion about bit assignments within octets, the UUID
   record definition is defined only in terms of fields that are
   integral numbers of octets.  The fields are presented with the most
   significant one first.

   Field                  Data Type     Octet  Note
                                        #

   time_low               unsigned 32   0-3    The low field of the
                          bit integer          timestamp

   time_mid               unsigned 16   4-5    The middle field of the
                          bit integer          timestamp

   time_hi_and_version    unsigned 16   6-7    The high field of the
                          bit integer          timestamp multiplexed
                                               with the version number  

   clock_seq_hi_and_rese  unsigned 8    8      The high field of the
   rved                   bit integer          clock sequence
                                               multiplexed with the
                                               variant

   clock_seq_low          unsigned 8    9      The low field of the
                          bit integer          clock sequence

   node                   unsigned 48   10-15  The spatially unique
                          bit integer          node identifier

   In the absence of explicit application or presentation protocol
   specification to the contrary, a UUID is encoded as a 128-bit object,
   as follows:

   The fields are encoded as 16 octets, with the sizes and order of the
   fields defined above, and with each field encoded with the Most
   Significant Byte first (known as network byte order).  Note that the
   field names, particularly for multiplexed fields, follow historical
   practice.

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          time_low                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       time_mid                |         time_hi_and_version   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         node (2-5)                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
BenMorel
  • 34,448
  • 50
  • 182
  • 322
Matten
  • 17,365
  • 2
  • 42
  • 64
  • 15
    Why was the timestamp split into three parts? – user253751 Jun 30 '14 at 07:49
  • 4
    How the fields are generated depends on the UUID version. The preferred method doesn't use time since that reveals the time the ID was generated (a potential security concern). http://en.wikipedia.org/wiki/Universally_unique_identifier#Variants_and_versions – pmont May 01 '15 at 21:01
  • 1
    @pmont “Preferred”? – Basil Bourque Aug 18 '17 at 18:55
  • @BasilBourque [version 4](https://stackoverflow.com/q/47230521/1739000) is now the [preferred version](https://stackoverflow.com/a/20342413/1739000) for simple identifier creation. – NH. Oct 15 '18 at 21:38
  • 1
    @NH. I cannot understand what you & pmont mean by Version 4 being “preferred”, and your link does not help. V4 being almost entirely random makes this the UUID version of last resort, the least desirable. While fine for casual use or for small numbers of instances, it does carry some risk of collisions over large numbers of instances. Not enough that I would worry about it for most uses. But Version 1 or similar is certainly preferable over V4 by eliminating all practical risk of collision by way of picking a certain non-recurring point in space and time. V4 is *convenient*, not *preferred*. – Basil Bourque Oct 15 '18 at 21:46
  • @BasilBourque both convenient _and_ preferred (if the random number generation is good enough)! You sound like you still don't understand the infinitesimal (nearly 0) probability of collision, so here is some further reading: https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions – NH. Oct 15 '18 at 22:10
  • @NH. I do understand, as I commented. But saying V4 is *preferred* over V1 is silly and nonsensical, like saying seat belts are preferred for vehicle safety over seat-belt-with-airbags. Yes, seat belts *vastly* reduce your chances of injury, but belts-with-airbags reduce the chances by *another* enormous amount. Belts are better than nothing, but airbags are even better. V4 is good enough for many purposes, but V1 is even better. If V1 is available, there would be *no* reason to choose V4. If security or privacy concerns forbid V1 usage, use one of the variations similar to V1, not V4. – Basil Bourque Oct 15 '18 at 22:24
  • @BasilBourque V4 is preferred. With probabilities of this scale, it's easier to get collisions with V1 than with V4. The reason for it is that systems sometimes fail, and when comparing in these probability scales, it becomes the dominant source of collisions. V1 is more complicated to build and therefore more prone to partial failure. – brocoli Dec 03 '18 at 20:46
  • 2
    @brocoli I have to disagree. V4 depends on a cryptographically-strong random number generator, which is *much* tougher to build well than simply grabbing the [MAC address](https://en.wikipedia.org/wiki/MAC_address), the current moment, and an incrementing arbitrary number, as seen in V1 UUID. Moreover, the implementations of V1 are generally open-source and built many years ago with much use throughout the industry, now well-worn. Claiming V1 is “prone to partial failure” is just silly. A V1 UUID is *last* piece of your system where you need to worry about failure. – Basil Bourque Dec 03 '18 at 21:39
  • 2
    @BasilBourque One of the issues that you can see now with the proliferation of containers and container networking are colliding MAC addresses. Typically containers and VMs pull from a limited range of possible MAC addresses. IIRC Hyper-V only pulls from a pool of 256 possible MAC addresses by default. – Nathan Clayton Apr 14 '20 at 23:11
14

The format is defined in IETF RFC4122 in section 3. The output format is defined where it says "UUID = ..."

3.- Namespace Registration Template

Namespace ID: UUID Registration Information: Registration date: 2003-10-01

Declared registrant of the namespace: JTC 1/SC6 (ASN.1 Rapporteur Group)

Declaration of syntactic structure: A UUID is an identifier that is unique across both space and time, with respect to the space of all UUIDs. Since a UUID is a fixed size and contains a time field, it is possible for values to rollover (around A.D. 3400, depending on the specific algorithm used). A UUID can be used for multiple purposes, from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network.

  The internal representation of a UUID is a specific sequence of
  bits in memory, as described in Section 4.  To accurately
  represent a UUID as a URN, it is necessary to convert the bit
  sequence to a string representation.

  Each field is treated as an integer and has its value printed as a
  zero-filled hexadecimal digit string with the most significant
  digit first.  The hexadecimal values "a" through "f" are output as
  lower case characters and are case insensitive on input.

  The formal definition of the UUID string representation is
  provided by the following ABNF [7]:

  UUID                   = time-low "-" time-mid "-"
                           time-high-and-version "-"
                           clock-seq-and-reserved
                           clock-seq-low "-" node
  time-low               = 4hexOctet
  time-mid               = 2hexOctet
  time-high-and-version  = 2hexOctet
  clock-seq-and-reserved = hexOctet
  clock-seq-low          = hexOctet
  node                   = 6hexOctet
  hexOctet               = hexDigit hexDigit
  hexDigit =
        "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" /
        "a" / "b" / "c" / "d" / "e" / "f" /
        "A" / "B" / "C" / "D" / "E" / "F"
8

128 bits

The "8-4-4-4-12" format is just for reading by humans. The UUID is really a 128-bit number.

Consider the string format requires the double of the bytes than the 128 bit number when stored or in memory. I would suggest to use the number internally and when it needs to be shown on a UI or exported in a file, use the string format.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Pablo Pazos
  • 3,080
  • 29
  • 42