1

I have a large domain set of tables in a database - over 100 tables. Every single one uses a uniqueidentifier as a PK.

I'm realizing now that my mistake is that these are also by default, the clustered index.

Consider a table with this type of structure:

Orders
      Id (uniqueidentifier) Primary Key
      UserId (uniqueidentifier)
      .
      .
      .
      .

      Other columns

Most queries are going to be something like "Get top 10 orders for user X sorted by OrderDate".

In this case, would it make sense to create a clustered index on UserId,Id...that way the data is physically stored sorted by UserId?

I'm not too concerned about Inserts and Updates - those will be few enough that performance loss there isn't a big deal. I'm mostly concerned with READs.

Ilyes
  • 14,640
  • 4
  • 29
  • 55
richard
  • 12,263
  • 23
  • 95
  • 151
  • If you care about performance do not cluster an index on GUID. – Ilyes Aug 17 '19 at 10:15
  • I'm pretty confident that has has been disproved, @Sami. I'll try and find the article(s) on it; as i may well be wrong/misremembering – Thom A Aug 17 '19 at 10:29
  • @Larnu From what I know, it depends on how the GUID is generated. With NEWSEQUENTIALID, you'll get ever increasing values which prevents index fragmentation. But those (sequential GUIDs) may not be an option. In general, I don't see the need for clustering on a GUID. – TT. Aug 17 '19 at 14:02
  • Relevant: [Should I get rid of clustered indexes on Guid columns](https://stackoverflow.com/q/277625/243373) – TT. Aug 17 '19 at 14:04

2 Answers2

2

A clustered index means that data is physically stored in the order of the values. By default, the primary key is used for the clustered index.

The problem with GUIDs is that they are generated is (essentially) random order. That means that inserts are happening "in the middle" of the table. And, such inserts result in fragmentation.

Without getting into database internals, this is a little hard to explain. But what it means is that inserts require much more work than just inserting the values "at the end" of the table, because new rows go in the middle of a data page so the other rows have to be moved around.

SQL Server offers a solution for this, newsequentialid(). On a given server, this returns a sequential value which is inserted at the end. Often, this is an excellent compromise if you have to use GUIDs.

That said, I have a preference for just plain old ints as ids -- identity columns. These are smaller, so they take up less space. This is particularly true for indexes. Inserts work well because new values go at the "end" of the table. I also find integers easier to work with visually.

Using identity columns for primary keys and foreign key references still allows you to have unique GUID columns for each identity, if that is a requirement for the database (say for interfacing to other applications).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks Gordon. As I said, I'm not too concerned with inserts at this time. It's not a transactional table - we aren't getting lots of inserts. But the queries do matter. All queries will pull from only one user at a time, so it seems a clustered index that groups these `Orders` together by user would be a good way to make those queries faster. – richard Aug 17 '19 at 20:07
  • What would the purpose of the int identity column be? I won't ever query on it... – richard Aug 17 '19 at 20:08
  • @richard . . . They are more efficient for foreign key references and are easier for debugging/investigating. – Gordon Linoff Aug 17 '19 at 21:06
  • Understood. But my whole model is built on GUIDs as Ids etc.. I can't change it now. So I need to figure out how to make this work. – richard Aug 17 '19 at 21:22
  • @richard . . . The answer suggests `newsequentialid()`. – Gordon Linoff Aug 17 '19 at 21:23
  • Thanks Gordon. But our db doesnt create the Ids. The domain/business layer of our application does. – richard Aug 17 '19 at 21:29
  • If I do clustered index on the Parent Id and the Id...won't that mitigate the problem? The rows will be physically grouped by the parent Id and then the db just has to scan the children for that parent, not the whole table. – richard Aug 17 '19 at 21:31
  • Also forgot to mention I'm on Azure sql so newsequentialid isnt an option. – richard Aug 17 '19 at 21:38
0

Clustered index is when you want to retrieve rows for a range of values for a given column. As data is physically arranged in that order, the rows can be extracted very efficiently.

a GUID, while excellent for a primary key, could be positively detrimental to performance, as there will be additional cost for inserts and no perceptible benefit on selects.

So yes, don't cluster an index on GUID.

Mahesh Waghmare
  • 726
  • 9
  • 27
  • As I said - I'm not concerned about inserts. This isn't high-transactional data. The queries are my biggest concern. I need to be able to select based on a given user Id, so it seems the best way to do that is group them physically on disk by User Id (i.e.. user Id clustered index) – richard Aug 17 '19 at 20:09