4

I was wondering if there are any obvious pros and cons one should be aware about, while choosing to use a Binary type as a primary key, while being RANDOMIZED.

Which means, new inserts will have random 4 bytes as their PK - versus having a sequence of type Integer?

I'm using MySQL/MyISAM.

Thanks, Doori Bar

Doori Bar
  • 873
  • 2
  • 13
  • 20
  • 1
    I have heard of using GUIDs but not random numbers. This post may be relevant: http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid – YWE Oct 27 '10 at 21:22

1 Answers1

8

If the primary key isn't sequential then insert performance for a clustered index will not be very good. It has to rearrange the index for almost every insert.

Jay
  • 13,803
  • 4
  • 42
  • 69
  • Thanks for the fast response, I was wondering if this performance cost - is the same cost as one would pay, for using a sequential integer primary key, while having a randomized 4 bytes as an index? – Doori Bar Oct 27 '10 at 21:31
  • 1
    @Doori: depends, if InnoDB and more fields, definitely less (rows are stored based on primary key / has a clustered index), for MyISAM, you'll have to test, and it depends on your actual table, but I think it's safe to say 2 indexes instead of 1 will have an overhead of some sorts. – Wrikken Oct 27 '10 at 21:42
  • @Wrikken: So as I understand, it's a mandatory performance cost which I have to accept while having the requirement of a randomized index. Thanks for your clarification. – Doori Bar Oct 27 '10 at 21:56
  • Indeed, randomized will mean somewhere in the system records or keys are stored in an order that is not the order of creation, resulting in having to 'shift' data after creation of a new random key, which always be more costly then appending the certain sequential one. – Wrikken Oct 27 '10 at 22:31
  • Using Guids for primary keys also suffers from this as well. They're basically just random numbers as well. I avoid random numbers as a primary key because they're not guaranteed to be unique. You end up jumping through a lot of hoops to get what's already provided by using sequential integers (GUIDs). – Jay Oct 28 '10 at 15:27
  • I've stumbled: "USING HASH" under MySQL... It's actually what I need to make things more efficient and suitable to the case? – Doori Bar Oct 29 '10 at 00:15
  • I did find an instance where guid keys may be superior. In distributed storage over many servers they can help evenly distribute inserts over many servers for better parallelism. It's very much a special case though – Jay Sep 25 '17 at 12:23