SQL primary key on lookup table or unique constraint?

Question

I want to create a lookup table 'orderstatus'. i.e. below, just to clarify this is to be used in a Data Warehouse. I will need to join through OrderStatus to retrieve the INT (if i create one) to be used elsewhere if need be. Like in a fact table for example, I would store the int in the fact table to link to the lookup table.

+------------------------+------------------+
| OrderStatus            | ConnectionStatus |
+------------------------+------------------+
| CLOSED                 | APPROVE          |
+------------------------+------------------+
| COMPLETED              | APPROVE          |
+------------------------+------------------+
| FULFILLED              | APPROVE          |
+------------------------+------------------+
| CANCELLED              | CLOSED           |
+------------------------+------------------+
| DECLINED               | CLOSED           |
+------------------------+------------------+
| AVS_CHECK_SYSTEM_ERROR | CLOSED           |
+------------------------+------------------+

What is best practise in terms of primary key/unique key? Should i just create an OrderStatusKey INT as PrimaryKey with identity? Or create a unique constraint on order status (unique)? Thanks.

If `OrderStatus` column is going to be referred in many tables then I will go with `Identity` column with a primary key on it and refer it in other tables. Check this question as well http://stackoverflow.com/questions/3162202/sql-primary-key-integer-vs-varchar — Pரதீப், Nov 30 '16 at 16:25
You should give this a read. http://www.agiledata.org/essays/keys.html. — JamieSee, Dec 01 '16 at 15:43

Neo · Accepted Answer · 2016-11-30T16:35:56.590

2

For this, I would suggest you create an Identity column, and make that the clustered primary key.

It is considered best practice for tables to have a primary key of some kind, but having a clustered index for a table like this is the fastest way to allow for the use of this table in multi table queries ( with joins ).

Here is a sample as to how to add it:

ALTER TABLE dbo.orderstatus 
ADD CONSTRAINT PK_orderstatus_OrderStatusID PRIMARY KEY CLUSTERED (OrderStatusID);
GO

Article with more details MSDN

And here is another resource for explaining a primary key Primary Key Primer

edited Nov 30 '16 at 16:35

answered Nov 30 '16 at 16:22

Neo

3,309
7
35
44

I don't disagree, but could you elaborate as to why? – S3S Nov 30 '16 at 16:23
3

Don't mix the concepts of PRIMARY KEY and CLUSTERED INDEX. – Pred Nov 30 '16 at 16:28
Why not create an OrderStatusKey INT as Primary Key Identity with either a unique constraint or non clustered index on OrderStatus or is this overkill? What is more performant non clustered index or unique constraint? – jhowe Nov 30 '16 at 16:44
You do not need a unique contraint for this type of table, its overkill. A primary key clustered is all you need here. If you had a more complicated type of table, you may need to look at a unique constraint. – Neo Nov 30 '16 at 16:46
Thanks that clarifies things. – jhowe Nov 30 '16 at 16:49

score 1 · Answer 2 · answered Nov 30 '16 at 16:25

If OrderStatus is unique and the primary identifier AND you will be reusing this status code directly in related tables (and not a numeric pointer to this status code) then keep the columns as is and make OrderStatus the primary clustered index.

A little explanation:

A primary key is unique across the table; a clustered index ties all record data back to that index. It is not always necessary to have the primary key also be the clustered index on the table but usually this is the case.

If you are going to be linking to the order status using something other than the status code then create another column of type int as an IDENTITY and make that the primary clustered key. Also add a unique non-clustered index to OrderStatus to ensure that no duplicates could ever be added.

Either way you go every table should have a primary key as well as a clustered index (again, usually they are the same index).

Pred · Answer 3 · 2016-11-30T17:34:36.267

Here are some things to consider:

PRIMARY KEY ensures that there is no NULL values or duplicates in the table
UNIQUE KEY can contain NULL and (by the ANSI standard) any number of NULLs. (This behavior depends on SQL Server settings and possible index filters r not null constraints)
The CLUSTERED INDEX contains all the data related to a row on the leaves.
- When the CLUSTERED INDEX is not unique (and not null), the SQL Server will add a hidden GUID to each row.
- SQL Server add a hidden GUID column to the key column list when the key columns are not unique to distinguish the individual records)
All indexes are using either values of the key columns of the clustered index or the rowid of a heap table.
The query optimizer uses the index stats to find out the best way to execute a query
For small tables, the indexes are ignore usually, since doing an index scan, then a lookup for each values is more expensive than doing a full table scan (which will read one or two pages when you have really small tables)
Status lookup tables are usually very small and can be stored on one page.

The referencing tables will store the PK value (or unique) in their structure (this is what you'll use to do a join too). You can have a slight performance benefit if you have an integer key to use as reference (aka IDENTITY in SQL Server).

If you usually don't want to list the ConnectionStatus, then using the actual display value (OrderStatus) can be beneficial, since you don't have to join the lookup table.

You can store both values in the referencing tables, but the maintaining both columns have some overhead and more space for errors.

The clustered/non-clustered question depends on the use cases of this table. If you usually use the OrderStatus for filtering (using the textual form), a NON CLUSTERED IDENTITY PK and a CLUESTERED UNIQUE on the OrderStatus can be beneficial. However (as you can read it above), in small tables the effect/performance gain is usually negligible.

If you are not familiar with the above things and you feel it safer, then create an identity clustered PK (OrderKey or OrderID) and a unique non clustered key on the OrderStatus.

Use the PK as referencing/referenced column in foreign keys.

One more thing: if this column will be referenced by only one table, you may want to consider to create an indexed view which contains both table's data.

Also, I would suggest to add a dummy value what you can use if there is no status set (and use it as default for all referencing columns). Because not set is still a status, isn't it?

SQL primary key on lookup table or unique constraint?

3 Answers3