When using NEWSEQUENTIALID() as primary key, what should be my clustered index?

Question

I'm using newsequentialid to generate GUIDs for my primary key in a table.

According to the documentation (https://learn.microsoft.com/en-us/sql/t-sql/functions/newsequentialid-transact-sql?view=sql-server-ver15), Sequential GUIDs aren't guarantined to be generated in order.

After restarting Windows, the GUID can start again from a lower range, but is still globally unique

Basically, they're in order until you reboot the machine.

For autoincrement primary key, it makes sense for it to be the clustered index cause it's guaranteed an inserted row will be at the end.

For a GUID primary key, it doesn't make sense for it to be the clustered index cause it's random it's unlikely an inserted row will be at the end.

What about for a sequential GUID primary key? Should the primary key be the clustered index or should I try to find another column like a DateCreated field? The problem is fields like DateCreated isn't going to be a unique field. If I don't have any fields that are unique fields, what should I make as the clustered index?

Does this answer your question? [How to choose the clustered index in SQL Server?](https://stackoverflow.com/questions/2267326/how-to-choose-the-clustered-index-in-sql-server) — SMor, Jun 18 '20 at 13:29
`For a GUID primary key, it doesn't make sense for it to be the clustered index` - that all depends on what problem you're solving and the nature of the data being inserted. Some would argue that an increasing column is a bad idea for a clustered index because it means all of your inserts go to the same page, creating contention or a "hot spot." Thomas Kejser, for example, [demonstrates how `NEWID()` can outperform sequential inserts](https://kejserbi.wordpress.com/2011/10/05/boosting-insert-speed-by-generating-scalable-keys/), again if your goal is to reduce insert bottlenecks. — Aaron Bertrand, Jun 18 '20 at 13:33
You [might get better performance](https://milossql.wordpress.com/2019/11/13/optimize_for_sequential_keys-miracle/) with sequential inserts using the new [OPTIMIZE_FOR_SEQUENTIAL_KEY table option in SQL Server 2019](https://techcommunity.microsoft.com/t5/sql-server/behind-the-scenes-on-optimize-for-sequential-key/ba-p/806888), but it does not resolve all contention/latency issues and [may still reveal a different bottleneck](https://www.youtube.com/watch?v=EaAZBr-OtQU). — Aaron Bertrand, Jun 18 '20 at 13:37
@AaronBertrand - thank you, I'll have to give those resources a read — Diskdrive, Jun 18 '20 at 13:41

score 3 · Accepted Answer · answered Jun 18 '20 at 13:23

3

Sequential GUIDs are much safer for clustered indexes than non-sequential GUIDs. In general, databases are not restarted particularly often. It is true that restarting can result in page splits and fragmentation, but that is usually not too big a consideration because restarting is rare.

That said, the primary key does not need to be the clustered index key. You can have an identity column or creation date/time as the clustered index, pretty much eliminating this issue.

answered Jun 18 '20 at 13:23

Gordon Linoff

1,242,037
58
646
786

Having another identity column would basically defeat one of the main benefits of sequential GUID of being able to merge data from different databases easily. In that case, you might as just use that as the primary key. Creation date isn't unique thus sounds like bad practice to make it the clustered index. However, I agree with you - reboots aren't often so the only effect is that first insert after a reboot should be slow. If it was important I grabbed data in chronological order, I could put a non-clustered index on the createddate. – Diskdrive Jun 18 '20 at 13:31
*"Having another identity column would basically defeat one of the main benefits of sequential GUID"* yes, but Gordon was saying about using `IDENTITY` with a non-sequential one, not a sequential one, @Diskdrive . – Thom A Jun 18 '20 at 13:32
1

@Diskdrive `first insert after a reboot should be slow` - why would it be any slower than any other insert that requires a new page or might get added to an existing page? It doesn't take SQL Server any more time to find a page that will accommodate a key value that is not consistent with the last key value that was inserted before the reboot. In any case, optimizing for the edge case of "first insert after reboot" at the potential cost of other downstream effects _all the time_ does not seem like the right path. – Aaron Bertrand Jun 18 '20 at 13:41
@AaronBertrand - no you're right - I think I'm fundamentally not quite understanding how clustered indexes work in the background so I need to read up on how pages work etc. – Diskdrive Jun 18 '20 at 13:50

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

I wrote a long post about this a while ago. The TL/DR is that using a sequential GUID as a clustered index key is fine. The GUIDs are actually inserted in the middle of the index, but having a small number (here one) mid-index insertion point does not cause expensive page splits or lead to harmful fragmentation.

Good Page Splits and Sequential GUID Key Generation

This same behavior applies to using a compound key as clustered index, where the leading key column has lower cardinality. Eg (CustomerId,TransactionId). Each CustomerId will have a half-full page with space for the next TransactionId, and when that page fills a new one is allocated.

When using NEWSEQUENTIALID() as primary key, what should be my clustered index?

2 Answers2