81

Based on the answer of question, UUID performance in MySQL, the person who answers suggest to store UUID as a number and not as a string. I'm not so sure how it can be done. Anyone could suggest me something? How my ruby code deal with that?

Community
  • 1
  • 1
Chamnap
  • 4,666
  • 2
  • 34
  • 46
  • 5
    The performance issues only arise when you're using the UUID a primary key, because UUIDs are not very efficient primary keys. Why do you need UUIDs? Could you keep the UUIDs and just use an autoincrement as the primary key? – Thom Smith Jun 08 '12 at 14:00
  • 4
    @ThomSmith Re "UUIDs are not very efficient primary keys".. care to cite a source that explains why? – Pacerier Jul 05 '12 at 22:13
  • 2
    It's a larger piece of data, and it will generally take more instructions to compare. It's not sequential, so the overhead of indexing is just a bit higher. And, of course, if you're storing it as a string instead of a 128-bit number, as the OP seems to be doing, the situation worsens. It's not a terrible key, but I wouldn't use it unless there was some external reason to do so. – Thom Smith Jul 06 '12 at 12:20
  • Autoincrement can cause problems with multiple shared database servers - often causing key collisions. UUIDs are intended to solve things like that. If you store your UUID not as text but as bin(16) then you of course have a numeric UUID. It is faster to compare binary than text. Here is a site discussing this - http://mysql.rjweb.org/doc.php/uuid – Jeff Clayton Oct 12 '19 at 16:12
  • The Percona blog has an article (that includes benchmarks) that replies to your question: [Store UUID in an optimized way](https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/). – dolmen Nov 02 '15 at 20:45

2 Answers2

116

If I understand correctly, you're using UUIDs in your primary column? People will say that a regular (integer) primary key will be faster , but there's another way using MySQL's dark side. In fact, MySQL is faster using binary than anything else when indexes are required.

Since UUID is 128 bits and is written as hexadecimal, it's very easy to speed up and store the UUID.

First, in your programming language remove the dashes

From 110E8400-E29B-11D4-A716-446655440000 to 110E8400E29B11D4A716446655440000.

Now it's 32 chars (like an MD5 hash, which this also works with).

Since a single BINARY in MySQL is 8 bits in size, BINARY(16) is the size of a UUID (8*16 = 128).

You can insert using:

INSERT INTO Table (FieldBin) VALUES (UNHEX("110E8400E29B11D4A716446655440000"))

and query using:

SELECT HEX(FieldBin) AS FieldBin FROM Table

Now in your programming language, re-insert the dashes at the positions 9, 14, 19 and 24 to match your original UUID. If the positions are always different you could store that info in a second field.

Full example :

CREATE TABLE  `test_table` (
    `field_binary` BINARY( 16 ) NULL ,
    PRIMARY KEY (  `field_binary` )
) ENGINE = INNODB ;

INSERT INTO  `test_table` (
    `field_binary`
)
VALUES (
    UNHEX(  '110E8400E29B11D4A716446655440000' )
);

SELECT HEX(field_binary) AS field_binary FROM `test_table`

If you want to use this technique with any hex string, always do length / 2 for the field length. So for a sha512, the field would be BINARY (64) since a sha512 encoding is 128 characters long.

NateS
  • 5,751
  • 4
  • 49
  • 59
David Bélanger
  • 7,400
  • 4
  • 37
  • 55
  • @Chamnap UNHEX function will convert HEX to BINARY in your database. You can then use indexes on it without a problem and with a gain of performance (yes yes !). You then read the data with the function `HEX` like in my example. So no, you can't read the result from `UNHEX`, but you can if use `HEX`. Remember the computer is made of binary, always faster. – David Bélanger Jun 08 '12 at 15:09
  • 3
    @Chamnap Let's say you have 10 000 rows in your database and they have been added using UNHEX function and you want to search for the UUID `110E8400-E29B-11D4-A716-446655440000`. Just do something like : `SELECT * FROM test_table WHERE field_binary LIKE CONCAT("%", UNHEX('110E8400E29B11D4A716446655440000'), "%")` – David Bélanger Jun 08 '12 at 15:13
  • Thanks, @David Belanger. Cannot do an equal condition? – Chamnap Jun 08 '12 at 15:14
  • 5
    You can read this if you have time. Focus on point 3 : http://www.xaprb.com/blog/2009/02/12/5-ways-to-make-hexadecimal-identifiers-perform-better-on-mysql/ – David Bélanger Jun 08 '12 at 15:16
  • 4
    @Chamnap Yes you can do, you should. I just wanted to demonstrate if you want to use the caracter % with the UNHEX function inside a LIKE. You could do `WHERE Field = UNHEX('110E8400E29B11D4A716446655440000')`. Instead of doing `WHERE Field = 3` or whatever, you wrap up the field with UNHEX when you are using a hex string (to search, to insert, where, update, delete, etc.) and you wrap the field with HEX when you want to read from MySQL (select). – David Bélanger Jun 08 '12 at 15:19
  • @Chamnap No prob. I tryed to explain the best I could, I am not good at teach ! Good luck. – David Bélanger Jun 08 '12 at 16:16
  • 2
    @DavidBélanger You said MySQL is faster indexing binary compared to ints. Any sources? – Pacerier Jul 05 '12 at 22:41
  • @Pacerier MySQL will always be faster in binary for the unique reason that it take less space so he can go over indexes faster. I invite you to read this article http://www.xaprb.com/blog/2009/02/12/5-ways-to-make-hexadecimal-identifiers-perform-better-on-mysql/ – David Bélanger Jul 06 '12 at 13:22
  • @DavidBélanger Unrelated to this question, but is there anyway to set auto_increment for binary columns? – Pacerier Jul 06 '12 at 13:36
  • @Pacerier Not that I know of. It's not made for that. The concept here apply on big indexes such as MD5, everything in HEX : SHA1, SHA2, SHA-256, etc. – David Bélanger Jul 06 '12 at 13:58
  • 4
    The wording is confusing on the BINARY type. A single "BINARY" in mysql is _8 bits_ in size, which is why BINARY(16) works (8*16 = 128, the size of a UUID). It does _NOT_ "store in 1 bit what hexadecimal does in 4 bits". That's impossible. "Two hexadecimal values can be stored in each unit-size of type BINARY, which is itself 8-bits in size, so we need 16 unit sizes of BINARY, thus we'l use BINARY(16)." – lilbyrdie Oct 21 '14 at 19:15
  • @lilbyrdie I agree with you. I did try to make it simple. – David Bélanger Apr 28 '15 at 13:08
  • 1
    @DavidBélanger - Great answer, except for first paragraph. `binary` is **not** faster than `int`. (Each size of) `int` is a short, fixed length, field; performance is similar to a `binary` with same number of bytes. Your linked article mentions `binary` vs `hexadecimal`, which in that article means *hexadecimal characters in a string* - which of course is slower and takes more space. (And IMHO there is nothing "dark side" about `binary` - its simply a specified number of bytes - your first paragraph would benefit from a complete rewrite, with most of it omitted.) – ToolmakerSteve Apr 07 '19 at 12:28
1

I don't think that its a good idea to use a binary.

Let's say that you want to query some value:

SELECT HEX(field_binary) AS field_binary FROM `test_table`

If we are returning several values then we are calling the HEX function several times.

However, the main problem is the next one:

SELECT * FROM `test_table`
    where field_binary=UNHEX('110E8400E29B11D4A716446655440000')

And using a function inside the where, simply ignores the index.

Also

SELECT * FROM `test_table`
    where field_binary=x'skdsdfk5rtirfdcv@#*#(&#@$9' 

Could leads to many problems.

magallanes
  • 6,583
  • 4
  • 54
  • 55
  • 4
    Have you tested performance of your concerns? You are suggesting that performance of HEX and UNHEX is worse than the performance problems of using a 36-character field as an index. I don't even have to test, to know that is false. (But since you believe otherwise, test) Second, the code you show is not how this is best handled. All your DB code should simply involve the 16-byte field. Don't Hex and Unhex. Just pass it to and from your DB as those 16 bytes. Do all queries directly with those 16-byte values. **Only when displaying to user**, do you need to convert it to a user-friendly version. – ToolmakerSteve Apr 07 '19 at 12:42