1

I have the following table - it is very terse, each column is a foreign key to an entity described elsewhere.

enter image description here

The three columns are a unique index - not a primary key - since the LocationId column needs to accommodate NULL. I was going to put an identity column into the table to serve as the primary key but then I realized that the index structure itself holds all the data and, aside from the insert of the table row which will populate the index, there will be no reason to ever refer back to the table for anything.

This being the case, I am assuming that the most efficient way to create this index is Non-Clustered. If I created this as Clustered, then effort would go toward ordering the data within the table to the same order as the index, but this would be wasted effort as there is nothing within the table which is not within the index. (Will SQL Server even build a table for this? Seems kind of a waste.)

The most frequent read access to this table will be to read all rows for a particular TenantId, ordered by UnitId, LocationId.

Please advise - should this index be Clustered or Non-Clustered?

Yossi G.
  • 939
  • 1
  • 8
  • 22
  • 2
    Why don't you make the unique index a clustered index? It being primary key or not is irrelevant, if it is the most common access method. [Remember that the clustered index **is the actual table**](https://stackoverflow.com/a/24470091/14868997) Although I don't get the table design: a unique index without a filter means you can only have one `NULL` per `TenantId, UnitID` – Charlieface Mar 01 '21 at 04:32
  • @Charlieface Exactly correct! The combination of TenantId, UnitId and LocationId must be unique. I am using the value NULL as a special representation - a shortcut to logically indicate "All Locations" - thus saving the insertion of dozens of rows plus accommodation for locations which are added in the future. The NULL seamlessly bypasses the foreign key constraint on that column, but unfortunately PKs don't allow for NULL in any of their columns. So it must be a unique index rather than PK. – Yossi G. Mar 01 '21 at 15:40
  • 1
    Ah that explains your use case. To be honest I would consider making a separate table with just `Tenant, Unit` which covers those cases. – Charlieface Mar 01 '21 at 15:54
  • @Charlieface Yes, I definitely considered that. A TenantLocationLinksALL table. However, there are actually four of these tables - locations, departments, employees and job groups so the duplication would increase 4 tables to 8. It's a security subsystem so every application data query would then have to left join to 8 security tables to see whether the interactive user has privilege to a given row of data. 4 joins is already cumbersome and I didn't want to propagate to 8. Instead of two joins, it's one join with two possible (OR) matching conditionals. Your point is valid though. – Yossi G. Mar 01 '21 at 23:56
  • I thought that you would need an `OR`, it can be quite inefficient, as it may be a scan. Given your use case, I suggest either an indexed view of all user permissions. Or better: use Windows Authentication, AD groups and Row-Level security to enforce the permissions you want – Charlieface Mar 02 '21 at 00:06
  • @Charlieface I will keep an eye on the OR. So far, performance benchmarking is good, but always looking for improvements. We're a web application with external customers all across the country. So Windows and AD don't work for us, unfortunately. – Yossi G. Mar 02 '21 at 01:24

2 Answers2

2

I am a fan of using a synthetic key, even in this case: I think it makes it simpler to manage the table (for instance, a numeric key shows the order of insertion). But I respect those that disagree.

You are not taking that approach. What you probably do want are three separate indexes on each of the columns:

  • TenantId
  • UnitId
  • LocationId

Actually, you don't explain what the database is, but I would be surprised if "unit" and "location" were really two different things. Seems to me that "units" have a "location" and you should look up the location through the unit. But that is only a guess based on the naming.

The real question on a clustered index is:

  • Do you want the columns to be the primary key of the table?
  • Do you want a separate unique index?

There is little difference between the two, except that a unique index will duplicate portions of data for the header. I'm not sure if this is additional overhead when there is no clustered index on the table. That said, either method can replace one of the above indexes -- whichever is the first key in the index.

The major concern with clustering is fragmentation as you insert and delete records. However, you have a very similar concern with index fragmentation on a unique index. I don't actually know if clustered index fragmentation is "worse" in some sense than unique index fragmentation. However, with no additional columns I suspect they are similar.

So, I suspect that with your data model, you will want to define the columns as the primary key, which SQL Server will by default use for clustering.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

You want a clustered index if you have decided not to use a primary key.

You are correct that it would be a waste for SQL to actually store a separate copy of the data to support your clustered index. It does not do so.

Your data will be stored in sorted order, so you will be able to find it efficiently based on your one use case.

"There can be only one clustered index per table, because the data rows themselves can be stored in only one order. The only time the data rows in a table are stored in sorted order is when the table contains a clustered index." - Source https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15#:~:text=There%20can%20be%20only%20one,table%20contains%20a%20clustered%20index.

However...You state that "The most frequent read access to this table will be to read all rows for a particular TenantId, ordered by UnitId, LocationId." If you expect this table to become large, and you read from this table (even infrequently) based upon other lookups than this, then you may wish to consider a primary key and one or more non-clustered indexes. Even infrequent reads may be very painful if this table becomes very large.

rpestl
  • 41
  • 2
  • I think this is actually the answer, but I'd like to clarify. My first concern was that the DBMS manager would need to rearrange the rows in the table to accommodate clustering. I think that point is irrelevant as the DBMS needs to do this for the index anyway. My second concern is the duplication of the table and index structures since these are exactly the same. I think your point is that with a clustered index there will be no separate index structure since, with clustering, the table itself functions as the index. Do I have these correct? – Yossi G. Mar 01 '21 at 15:43
  • 1
    @YossiGeretz Correct on both. – Charlieface Mar 01 '21 at 15:52
  • @Charlieface "Even infrequent reads may be very painful if this table becomes very large." Why is that? All items for a particular Tenant will be clustered on disk. A given select will always be within the scope of a single tenant. What could be more efficient than that? – Yossi G. Mar 02 '21 at 00:00
  • 2
    @YossiGeretz Quite right if that is the case. But OP wasn't saying that, what OP said was *"you read from this table (even infrequently) based upon **other lookups** than this"* – Charlieface Mar 02 '21 at 00:09
  • @Charlieface Of course. That clarifies it. Right now our only use case is to read in the order described. I never say never though. If we needed different access paths in the future, yes, additional indexes would be defined to speed up those access paths. The point here is, that without a primary key, any index would need to have ALL of the necessary columns either defined as part of the index or included with the index, because without a primary key there's no way to get back from the index to the corresponding row(s) in the table. Yes, that makes sense! – Yossi G. Mar 02 '21 at 01:19
  • 1
    @YossiGeretz Not quite right. A table without a clustered index is called a heap, its NCIs use a marker called RID (rowid) to get the location of the row. It has bad performance implications if there are frequent changes, due to ghosts and forwarded records. A clustered table has its NCIs use the clustering key to reference the row, so all NCIs effectively have those columns as `include` columns. But it does not need to be the PK, which can be enforced with an NCI – Charlieface Mar 02 '21 at 01:26
  • @Charlieface I was basing my assumption on this article here: https://www.sqlmatters.com/Articles/Creating%20an%20index%20on%20a%20table%20with%20no%20primary%20key.aspx The author says that without a primary key his indexed access was generating a table scan. – Yossi G. Mar 02 '21 at 01:30
  • 1
    @YossiGeretz Completely wrong assumption by that author: the scan would have happened on a clustered table also. The issue was because of no `include`s, the compiler (quite rightly) decided the cost of extra bookmark (RID) lookups wasn't worth it. An RID lookup is of course still possible on a heap table. A clustered table has the same effect, just on clustered key lookups instead – Charlieface Mar 02 '21 at 01:40