0

On choosing GUID vs identity INT for primary key…

Is it true that an insert of a GUID is going to take longer because it has to search for the sequential place in the index to put that GUID? And that it can cause paging?

If this is true, maybe we should have used identity INT all along?

Thanks.

2 Answers2

1

It is true that a newer GUID value can be lesser than a GUID previously inserted. Yes, enough of those can force the index to expand pages. An INT or BIGINT auto increment/identity column wouldn't run into that unless you inserted them lower identity manually with identity insert ON.

if you can't change from a GUID for some reason check out NEWSEQUENTIALID() http://msdn.microsoft.com/en-us/library/ms189786.aspx It will find a greater than previously used GUID. The caveat is that the "great than" portion only holds true until machine restarts.

Brad

Brad D
  • 752
  • 4
  • 8
  • I moved from "random" guids to sequential guids..with big performance impact. I also used this for client side sequential guid generation. http://stackoverflow.com/questions/211498/is-there-a-net-equalent-to-sql-servers-newsequentialid – granadaCoder Nov 06 '14 at 18:22
0

I think you mean CLUSTERED INDEX more so than PRIMARY KEY. Of course, ANY index will get fragmented rather quickly when using NEWID() as the values are not ever-increasing and hence cause page-splits.

I provided a rather detailed explanation of the effects of non-sequential INSERT operations here:
What is fragmenting my index on a table with stable traffic?

Yes, NEWSEQUENTIALID() is far better, at least in terms of GUIDs, in that it is sequential, BUT, it is not a guarantee against page splits. The issue is that the starting value is determined based on a value that gets reset when the server is restarted. Hence, after a restart, the new sequential range can start lower than the previous first record. The first INSERT at that point will cause a page split but from there should be fine.

Also, a UNIQUEIDENTIFIER is 16 bytes as compared to INT being 4 and even BIGINT being 8. That increased size makes each row take up more space and hence less rows fit on each 8k datapage. Allocating new pages takes time, especially if you need to grow the datafile to fit the new pages (and the larger the rows the faster the datafile fills). Hence, yes, you most likely should have gone with INT IDENTITY from the start.

In such cases that an external and/or portable identifier is needed (this is where GUIDs are very handy), then you should still start with an INT (or even BIGINT) IDENTITY field as the clustered PK and have another column be the GUID with a UNIQUE INDEX placed on it. This is known as an alternate key and works well in these situations.

Community
  • 1
  • 1
Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171