0

The regex for validating uuid v4 is the following :

/^[0-9A-F]{8}-[0-9A-F]{4}-[4][0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$/i

As you can see, the 4th group contains [89AB][0-9A-F]{3} instead of [0-9A-F]{4}

Do you know why that is ? Is there any historical reason for this choice ?

Yesterday I found out that the library that I use doesn't generate correct uuids because of the 4th group 1st character which was randomly choosen, and I wonder why there is such a restriction since this makes implementations a bit more complex.

edi9999
  • 19,701
  • 13
  • 88
  • 127
  • read about the uuid spec. those bytes contain algorithm and version used to create the uuid. this ensures uuids are unique even between different implementations as each implementation gets their own values assigned. otherwise algorithm A could lead to a value that algorithm B already created by a different method. – Psi Aug 24 '22 at 10:12

1 Answers1

0

I found out my answer on this Wikipedia article

123e4567-e89b-12d3-a456-426614174000
xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

The four-bit M and the 1 to 3 bit N fields code the format of the UUID itself.

The four bits of digit M are the UUID version, and the 1 to 3 most significant bits of digit N code the UUID variant. (See below.) In the example, M is 1, and N is a (10xx2), meaning that this is a version-1, variant-1 UUID; that is, a time-based DCE/RFC 4122 UUID.

So in the fourth block, N is the variant.

  • Variant 0 (indicated by the one-bit pattern 0xxx2, N = 0..7) is for backwards compatibility with the now-obsolete Apollo Network Computing System 1.5 UUID format developed around 1988. The first 6 octets of the UUID are a 48-bit timestamp (the number of 4-microsecond units of time since 1 January 1980 UTC); the next 2 octets are reserved; the next octet is the "address family"; and the final 7 octets are a 56-bit host ID in the form specified by the address family. Though different in detail, the similarity with modern version-1 UUIDs is evident. The variant bits in the current UUID specification coincide with the high bits of the address family octet in NCS UUIDs. Though the address family could hold values in the range 0..255, only the values 0..13 were ever defined. Accordingly, the variant-0 bit pattern 0xxx avoids conflicts with historical NCS UUIDs, should any still exist in databases.[13]
  • Variant 1 (10xx2, N = 8..b, 2 bits) are referred to as RFC 4122/DCE 1.1 UUIDs, or "Leach–Salz" UUIDs, after the authors of the original Internet Draft.
  • Variant 2 (110x2, N = c..d, 3 bits) is characterized in the RFC as "reserved, Microsoft Corporation backward compatibility" and was used for early GUIDs on the Microsoft Windows platform. It differs from variant 1 only by the endianness in binary storage or transmission: variant-1 UUIDs use "network" (big-endian) byte order, while variant-2 GUIDs use "native" (little-endian) byte order for some subfields of the UUID.
  • Reserved is defined as the 3-bit variant bit pattern 111x2 (N = e..f).

So for the following variants, you would allow following characters :

  • For variant 0, character [0-7]
  • For variant 1, character [89ab]
  • For variant 2, character [cd]
  • For Reserved, character [ef]

Variant 0 was used in the past and is now kind of obsolete,

Variant 1 is the most commonly used variant, since the use of [89ab] in the regex.

Variant 2 was used by Microsoft Corporation and is sometimes still used.

edi9999
  • 19,701
  • 13
  • 88
  • 127