2

If I make an index on a string column how does MySQL store it in Btree? Does it store the raw strings or some hashed integer value?

Actually, I was thinking of storing UUID as PRIMARY KEY so just wanted to know how my index size get affected by the size of the string?

bazinga012
  • 713
  • 1
  • 5
  • 10
  • 4
    _Don't_ store your UUIDs as strings, store them as binary, q.v. here: https://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/ – Tim Biegeleisen Apr 27 '18 at 07:49
  • 1
    And see here for the index part of your question: https://stackoverflow.com/questions/16806986/which-index-should-i-use-on-binary-datatype-column-mysql – Tim Biegeleisen Apr 27 '18 at 07:50
  • The real question here is why are you after primary UUID key? Why not keep auto_increment PK and unique UUID? Store UUID like Tim suggested, in a binary column. `INSERT INTO TABLE SET uuid_col = UNHEX(REPLACE(UUID(), '-', ''));` (something along those lines to store it in binary form). – N.B. Apr 27 '18 at 08:02
  • Please answer the first part of the question: "If I make an index on a string column how does MySQL store it in Btree? Does it store the raw strings or some hashed integer value?" – bazinga012 Apr 27 '18 at 08:05
  • [Here's the source](https://github.com/mysql/mysql-server) and here's the docs on [InnoDB's index storage](https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html). Good luck. Have fun digging, it's silly to listen to people who are trying to help you not to waste time. – N.B. Apr 27 '18 at 08:12
  • @N.B. Sorry if I offended you in any way. But I just wanted to know how are strings actually stored in the B tree index? – bazinga012 Apr 27 '18 at 08:16
  • 1
    I'm not offended, you want an answer and I gave you the tools :) get what you're after, I'm just laughing to myself about your approach to this, Tim is trying to tell you something (like I am) because we've both been in this situation. InnoDB won't store a hash of a number, it wants an integer - if there's no integer primary key, it attempts to compute one if possible. If not possible, it creates a hidden 6-byte integer that it uses for the primary index, and your uuid will be secondary index. – N.B. Apr 27 '18 at 08:20
  • 1
    It stores the string in the btree, and the index size depends on the max length of the field (you can also make an index on the 1-st xx chars). Storing the UUID's as binary or using ascii instead of unicode will reduce the index size by a factor of 3-6. – Vatev Apr 27 '18 at 08:57
  • @N.B.,Tim Thanks a lot, understood how to store UUIDs. N.B. One more question though, can you please tell me how are strings stored in a secondary index? I mean does any transformation gets applied or raw strings are stored in the B tree? – bazinga012 Apr 27 '18 at 08:59
  • 1
    I don't know really, and I've no time to go through the code right now. Seeing that MySQL applies certain constraints to indexes (767 byte length), I'd make a guess that it stores them raw, no hashing. – N.B. Apr 27 '18 at 09:06
  • 1
    @bazinga012 secondary indexes are the same, no hashing. You can make your own hashing with a trigger and an indexed INT column though. – Vatev Apr 27 '18 at 09:17
  • Agree with @Vatev. Also, UUIDs are not usually optimal, as noted, but if you do, use an appropriate character set and collation, e.g. `CHAR(36) NOT NULL COLLATE ascii_general_ci` so that that the server will not allocate unnecessary space to accommodate multibyte characters. – Michael - sqlbot Apr 28 '18 at 12:01

0 Answers0