80

I'm generating UUIDs using PHP, per the function found here

Now I want to store that in a MySQL database. What is the best/most efficient MySQL field format for storing UUID v4?

I currently have varchar(256), but I'm pretty sure that's much larger than necessary. I've found lots of almost-answers, but they're generally ambiguous about what form of UUID they're referring to, so I'm asking for the specific format.

ata
  • 1,254
  • 1
  • 9
  • 30
Stephen R
  • 3,512
  • 1
  • 28
  • 45
  • UUID v4 seems to be 36 characters long. – Ibu Mar 27 '17 at 20:52
  • 1
    if you just need random tokens of some sort, you don't need uuid at all. a varchar256 doesn't take 256 bytes so 'too large' is probably not that big of a deal. – pvg Mar 27 '17 at 20:57
  • I'm doing this: http://stackoverflow.com/a/42520240/339440 – Stephen R Mar 27 '17 at 20:58
  • Save it as your VARCHAR(36) then – Ibu Mar 27 '17 at 20:58
  • 1
    You don't need uuid-anything for that. Just random bytes. You can store them as hex or whatever. http://php.net/manual/en/function.random-bytes.php – pvg Mar 27 '17 at 21:01
  • Ibu -- Probably the straightest answer I'll get. :-) Seeing a lot of stuff about how you should convert it binary and somesuch, because it's more better that way – Stephen R Mar 27 '17 at 21:01
  • 1
    Get a 16 random bytes. Store as hex. That's it. Don't forget to toss from the db once the login is done. Oh and do not pass as a get request as that answer suggests because that's crazy. – pvg Mar 27 '17 at 21:02
  • 4
    MySQL has function `UUID()` so you don't need PHP to generate it. You can remove dashes and save the hex number as `binary(16)`. If you do it via trigger, it's `SELECT UNHEX(REPLACE(UUID(), '-', ''));`, make it `unique` if you need the index, profit. – N.B. Mar 27 '17 at 21:09
  • @pvg -- could you please respond on that post regarding its method? Any input there is appreciated. I know Get isn't secure, but the whole point of that exercise is trying to take an inherently insecure thing and do it in a somewhat secure way -- thus (for example) also checking IP address of requestor – Stephen R Mar 27 '17 at 21:09
  • 1
    @N.B. That's a UUID1. `random_bytes` is from the OS CSPRNG which is what you want in this case. – pvg Mar 27 '17 at 21:15
  • 1
    @StephenR I don't think checking the IP is all that useful nor do I think the whole thing is inherently insecure. You just want to the token to be of good quality and to be one-time use. And just in case, you probably don't want it getting logged which is what sticking it in a get param will do. – pvg Mar 27 '17 at 21:16
  • @pvg So what you're suggesting for whole method: 1) create random_bytes, 2) store to DB as (???) format, 3) pass to ASP side as POST, 4) have ASP side find that bytestring and do its own login, 5) unset bytestring in DB ? If random_bytes is sufficient, I guess I'm confused as to what the point of a UUID is in the first place – Stephen R Mar 27 '17 at 21:24
  • 1
    There was no point to the UUID whatsoever. That's what I've been trying to tell you :) All you care about is that the token is sufficiently secure (16 `random_bytes` is) and that the token can't be grabbed/replayed (i.e. use SSL, invalidate the token). The other bits are less important (even the get param). – pvg Mar 27 '17 at 21:27
  • Is there *ever* a point to UUID, (obviously many people think so), or is this just not a good use case? Also, back to the OP, what exact MySQL format is best to store the token? varchar(16)? – Stephen R Mar 27 '17 at 21:29
  • 1
    UUID is if you care a lot about the format. But you don't. If you store this as a hex string, it'd be varchar(32). – pvg Mar 27 '17 at 21:30
  • 2
    @StephenR UUID is a trusted, well defined standard that should produce collision-free identifiers even when used in large-scale environments. They're a good alternative to sequential ID values for identifiers necessary to link disparate sources of information together. They're not good for generating secure tokens or keys as their search space is significantly smaller than that of a pure random string. That is they're collision resistant but have a comparatively narrow search space. – tadman Mar 28 '17 at 18:36

9 Answers9

92

Store it as VARCHAR(36) if you're looking to have an exact fit, or VARCHAR(255) which is going to work out with the same storage cost anyway. There's no reason to fuss over bytes here.

Remember VARCHAR fields are variable length, so the storage cost is proportional to how much data is actually in them, not how much data could be in them.

Storing it as BINARY is extremely annoying, the values are unprintable and can show up as garbage when running queries. There's rarely a reason to use the literal binary representation. Human-readable values can be copy-pasted, and worked with easily.

Some other platforms, like Postgres, have a proper UUID column which stores it internally in a more compact format, but displays it as human-readable, so you get the best of both approaches.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • 19
    Consider using binary(16) to conserve storage... MySQL provides functions to make storing the UUID (as binary) pretty straightforward so there's no excuse if all you need to do is update your queries... UUID_TO_BIN BIN_TO_UUID IS_UUID e.g. http://www.mysqltutorial.org/mysql-uuid/ – StratusBase LLC Nov 12 '19 at 12:40
  • @TheVirtualMachinist Shaving a few bytes isn't really going to help much in an age of terabyte-sized drives. If/when MySQL adopts a proper UUID column type, like Postgres, this will become a non-issue. – tadman Nov 12 '19 at 18:33
  • 38
    It baffles me how this could be the accepted answer. MySQL creates an index only for the first 4 bytes of a VARCHAR column and the ID in UUID suggests that it is going to be used for identification (thus also search). This answer is a recipe for a massive performance disaster. The correct way to store it is as a BINARY(16) or even better - use an adequate modern database with UUID support. – vstoyanov Jan 31 '20 at 10:59
  • @vstoyanov While MySQL doesn't necessarily index the whole string, which can be a problem for UNIQUE indexes on long VARCHAR columns, it hasn't been my experience it indexes only the first 4 bytes. Is there any documentation that explains that? That could be as little as one character being indexed, which makes no sense to me. Older versions of MySQL did have more limited index coverage, but as of 5.7 and later those limits have been dramatically increased. – tadman Jan 31 '20 at 18:25
  • 2
    Would searching be much faster with BINARY(16) vs. Varchar? – huggie Mar 25 '20 at 14:07
  • @huggie You'll need to benchmark to see how it performs on modern MySQL. What MySQL urgently needs is a native UUID type like Postgres has had for years. I'm surprised that didn't ship with 8.0. – tadman Mar 25 '20 at 16:07
  • 5
    @vstoyanov the performance/storage cost is in almost all cases ignorable IF it is just an unindexed data field. If it is an indexed field that becomes more of an issue and if it is used as primary key, oh boy, that's gonna be a significant difference especially for writes, see https://www.percona.com/blog/2019/11/22/uuids-are-popular-but-bad-for-performance-lets-discuss/ So, it depends and the OP doesn't specify which case it is as far as I can see – Frank Hopkins Aug 07 '20 at 20:29
  • 1
    @FrankHopkins MySQL needs to make UUID a first-class data type. Postgres has this and it's amazing. Then again, Postgres also has an internal row-identifier that makes indexes a lot smaller and more efficient. – tadman Aug 07 '20 at 20:31
  • 1
    @tadman love the answer from here http://mysqlserverteam.com/mysql-8-0-uuid-support/ that shows how much they underrate the need for that... "Creating a new datatype requires significant work on our side, we would like to see the feedback we get on the features we have just introduced and if there is still a strong need for UUID datatype, we will consider it for future versions.".. but perhaps one day they will consider it^^ – Frank Hopkins Aug 07 '20 at 22:01
  • Yes, I'm using UUID as primary key – Stephen R Feb 24 '21 at 23:51
  • @vstoyanov - `INDEX(uuid(4))` would index only the first 4 _characters_; `INDEX(uuid)` indexes the entire string (unless it exceeds some max). – Rick James Aug 05 '22 at 02:57
31

Question is about storing an UUID in MySQL.

Since version 8.0 of mySQL you can use binary(16) with automatic conversion via UUID_TO_BIN/BIN_TO_UUID functions: https://mysqlserverteam.com/mysql-8-0-uuid-support/

Be aware that mySQL has also a fast way to generate UUIDs as primary key:

INSERT INTO t VALUES(UUID_TO_BIN(UUID(), true))

Karsten R.
  • 1,628
  • 12
  • 14
  • 2
    Don't the built-in MySQL UUID functions create UUID v1, not v4? – Stephen R Feb 21 '18 at 18:36
  • 4
    `UUID_TO_BIN/BIN_TO_UUID` works for v4 UUID (not depending on version at all). `UUID()` generates "less than" a v1 UUID: https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_uuid – Karsten R. Feb 25 '18 at 11:18
  • 2
    I would note its a VERY bad idea to use uuid v4 as a primary key! The conversion that this function does actually becomes pointless. UUID v4's randomness will kill performance on your database. – Braden Rockwell Napier Dec 11 '18 at 01:29
  • 2
    @BradenRockwellNapier this article claims it can be faster than an integer key - https://www.qcode.in/ready-to-use-uuid-in-your-next-laravel-app/ – digout Apr 04 '19 at 15:22
  • @digout - i only looked for a few seconds there, but it looks like its uuidv1 hes using there which would be great. Could be wrong though. For Sequelize users I also provide this which does the proper formatting and allows using Sequelize https://github.com/odo-network/sequelize-binary-uuid – Braden Rockwell Napier Apr 23 '19 at 20:05
  • @digout ‘s link eventually leads here -- https://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/ -- which is great info for people stuck on MySQL 5.7 for whatever reason. as it manually replicates the UUID_TO_BIN and BIN_TO_UUID functions – Stephen R Jul 06 '20 at 16:07
30

If you always have a UUID for each row, you could store it as CHAR(36) and save 1 byte per row over VARCHAR(36).

uuid CHAR(36) CHARACTER SET ascii

In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. https://dev.mysql.com/doc/refman/5.7/en/char.html

Though be careful with CHAR, it will always consume the full length defined even if the field is left empty. Also, make sure to use ASCII for character set, as CHAR would otherwise plan for worst case scenario (i.e. 3 bytes per character in utf8, 4 in utf8mb4)

[...] MySQL must reserve four bytes for each character in a CHAR CHARACTER SET utf8mb4 column because that is the maximum possible length. For example, MySQL must reserve 40 bytes for a CHAR(10) CHARACTER SET utf8mb4 column. https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

Mathieu Rey
  • 603
  • 1
  • 9
  • 13
13

Most efficient is definitely BINARY(16), storing the human-readable characters uses over double the storage space, and means bigger indices and slower lookup. If your data is small enough that storing as them as text doesn't hurt performance, you probably don't need UUIDs over boring integer keys. Storing raw is really not as painful as others suggest because any decent db admin tool will display/dump the octets as hexadecimal, rather than literal bytes of "text". You shouldn't need to be looking up UUIDs manually in the db; if you have to, HEX() and x'deadbeef01' literals are your friends. It is trivial to write a function in your app – like the one you referenced – to deal with this for you. You could probably even do it in the database as virtual columns and stored procedures so the app never bothers with the raw data.

I would separate the UUID generation logic from the display logic to ensure that existing data are never changed and errors are detectable:

function guidv4($prettify = false)
{
    static $native = function_exists('random_bytes');

    $data = $native ? random_bytes(16) : openssl_random_pseudo_bytes(16);
    $data[6] = chr(ord($data[6]) & 0x0f | 0x40); // set version to 0100
    $data[8] = chr(ord($data[8]) & 0x3f | 0x80); // set bits 6-7 to 10
    if ($prettify) {
        return guid_pretty($data);
    }
    return $data;
}

function guid_pretty($data)
{
    return strlen($data) == 16 ?
        vsprintf('%s%s-%s-%s-%s-%s%s%s', str_split(bin2hex($data), 4)) :
        false;
}

function guid_ugly($data)
{
    $data = preg_replace('/[^[:xdigit:]]+/', '', $data);
    return strlen($data) == 32 ? hex2bin($data) : false;
}

Edit: If you only need the column pretty when reading the database, a statement like the following is sufficient:

ALTER TABLE test ADD uuid_pretty CHAR(36) GENERATED ALWAYS AS (CONCAT_WS('-', LEFT(HEX(uuid_ugly), 8), SUBSTR(HEX(uuid_ugly), 9, 4), SUBSTR(HEX(uuid_ugly), 13, 4), SUBSTR(HEX(uuid_ugly), 17, 4), RIGHT(HEX(uuid_ugly), 12))) VIRTUAL;
Walf
  • 8,535
  • 2
  • 44
  • 59
7

This works like a charm for me in MySQL 8.0.26

create table t (
    uuid BINARY(16) default (UUID_TO_BIN(UUID())),
)

When querying you may use

select BIN_TO_UUID(uuid) uuid from t;

The result is:

# uuid
'8c45583a-0e1f-11ec-804d-005056219395'
Bobin
  • 278
  • 5
  • 15
  • The MySQL `UUID()` function doesn’t generate UUID v4, so that as default wouldn’t work. The storage looks good though – Stephen R Sep 22 '21 at 00:47
3

The most space-efficient would be BINARY(16) or two BIGINT UNSIGNED.

The former might give you headaches because manual queries do not (in a straightforward way) give you readable/copyable values. The latter might give you headaches because of having to map between one value and two columns.

If this is a primary key, I would definitely not waste any space on it, as it becomes part of every secondary index as well. In other words, I would choose one of these types.

For performance, the randomness of random UUIDs (i.e. UUID v4, which is randomized) will hurt severely. This applies when the UUID is your primary key or if you do a lot of range queries on it. Your insertions into the primary index will be all over the place rather than all at (or near) the end. Your data loses temporal locality, which was a helpful property in various cases.

My main improvement would be to use something similar to a UUID v1, which uses a timestamp as part of its data, and ensure that the timestamp is in the highest bits. For example, the UUID might be composed something like this:

Timestamp | Machine Identifier | Counter

This way, we get a locality similar to auto-increment values.

Timo
  • 7,992
  • 4
  • 49
  • 67
  • In C#, for a UUID alternative with incremental properties (yet with most of a UUID's properties still intact), you can use a [DistributedId](https://github.com/TheArchitectDev/Architect.Identities#distributed-ids). These are highly efficient as database keys. In fact, they can even be stored as `DECIMAL(28, 0)`, or as `CHAR(16)` (making sure to use the ASCII charset with a binary collation). – Timo Jul 15 '21 at 07:41
2

This could be useful if you use binary(16) data type:

INSERT INTO table (UUID) VALUES
   (UNHEX(REPLACE(UUID(), "-","")))
B.Habibzadeh
  • 490
  • 6
  • 10
1

I just found a nice article going in more depth on these topics: https://www.xaprb.com/blog/2009/02/12/5-ways-to-make-hexadecimal-identifiers-perform-better-on-mysql/

It covers the storage of values, with the same options already expressed in the different answers on this page:

  • One: watch out for character set
  • Two: use fixed-length, non-nullable values
  • Three: Make it BINARY

But also adds some interesting insight about indexes:

  • Four: use prefix indexes

In many but not all cases, you don’t need to index the full length of the value. I usually find that the first 8 to 10 characters are unique. If it’s a secondary index, this is generally good enough. The beauty of this approach is that you can apply it to existing applications without any need to modify the column to BINARY or anything else—it’s an indexing-only change and doesn’t require the application or the queries to change.

Note that the article doesn't tell you how to create such a "prefix" index. Looking at MySQL documentation for Column Indexes we find:

[...] you can create an index that uses only the first N characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB or TEXT column, you must specify a prefix length for the index. For example:

CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));

[...] the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted as number of characters for nonbinary string types (CHAR, VARCHAR, TEXT) and number of bytes for binary string types (BINARY, VARBINARY, BLOB).

  • Five: build hash indexes

What you can do is generate a checksum of the values and index that. That’s right, a hash-of-a-hash. For most cases, CRC32() works pretty well (if not, you can use a 64-bit hash function). Create another column. [...] The CRC column isn’t guaranteed to be unique, so you need both criteria in the WHERE clause or this technique won’t work. Hash collisions happen quickly; you will probably get a collision with about 100k values, which is much sooner than you might think—don’t assume that a 32-bit hash means you can put 4 billion rows in your table before you get a collision.

Mathieu Rey
  • 603
  • 1
  • 9
  • 13
0

This is a fairly old post but still relevant and comes up in search results often, so I will add my answer to the mix. Since you already have to use a trigger or your own call to UUID() in your query, here are a pair of functions that I use to keep the UUID as text in for easy viewing in the database, but reducing the footprint from 36 down to 24 characters. (A 33% savings)

delimiter //

DROP FUNCTION IF EXISTS `base64_uuid`//
DROP FUNCTION IF EXISTS `uuid_from_base64`//


CREATE definer='root'@'localhost' FUNCTION base64_uuid() RETURNS varchar(24)
DETERMINISTIC
BEGIN
    /* converting INTO base 64 is easy, just turn the uuid into binary and base64 encode */
    return to_base64(unhex(replace(uuid(),'-','')));
END//

CREATE definer='root'@'localhost' FUNCTION uuid_from_base64(base64_uuid varchar(24)) RETURNS varchar(36)
DETERMINISTIC
BEGIN
    /* Getting the uuid back from the base 64 version requires a little more work as we need to put the dashes back */
    set @hex = hex(from_base64(base64_uuid));
    return lower(concat(substring(@hex,1,8),'-',substring(@hex,9,4),'-',substring(@hex,13,4),'-',substring(@hex,17,4),'-',substring(@hex,-12)));
END//