Identity field and primary key in SQL Server when values are unique

Question

When a set of values that will be stored in a table have a name or a code that should be unique across the system, should it be created with a primary key of ID auto increment (int)?

Take the situation of State Abbreviations. Other than consistency, what would be the purpose of an ID on the table that was the primary key other than the state name or abbreviation?

If for example the foreign key from an shipping address referenced the state abbreviation that is not mutable then ... is there a purpose for having an auto increment int ID?

You should not be looking for a reason to have an immutable key, you should be looking (and really hard) why not to. I've yet to see a reason for not, that doesn't make the assumption that the natural key is immutable. I'm not a big fan of assumptions like that, on account of never change and our game don't often go together. — Tony Hopkinson, Sep 26 '13 at 21:16
@Tony to be fair, the last time a state abbreviation was changed (or a new one was introduced) entirely pre-dates the concept of a relational database. The worst that is going to happen in our lifetime is that Canada will be added as a state - and that won't break this design whether they use an ID or not... — Aaron Bertrand, Sep 26 '13 at 21:33

score 3 · Answer 1 · answered Sep 26 '13 at 21:10

You highlighted one positive aspect of a separate table: consistency. It is much easier to have this:

CREATE TABLE dbo.States
(
  StateID TINYINT PRIMARY KEY,
  Name VARCHAR(32),
  Abbreviation CHAR(2)
);

CREATE TABLE dbo.CustomerAddresses
(
  AddressID INT PRIMARY KEY,
  ...,
  StateID TINYINT NOT NULL FOREIGN KEY REFERENCES dbo.States(StateID)
);

Than to have a trigger or check constraint like:

CHECK StateAbbreviation IN ('AL', 'AK', /* 50+ more states/territories... */)

Now, with something static and small like a 2-character state abbreviation, this design might make more sense, eliminating some unnecessary mapping between the abbreviations and some surrogate ID:

CREATE TABLE dbo.States
(
  Abbreviation CHAR(2) PRIMARY KEY,
  Name VARCHAR(32)
);

CREATE TABLE dbo.CustomerAddresses
(
  AddressID INT PRIMARY KEY,
  ...,
  StateAbbreviation CHAR(2) FOREIGN KEY REFERENCES dbo.States(Abbreviation)
);

This constrains the data to the known set of states, allows you to store the actual data in the table (which can eliminate a lot of joins in queries), actually saves you some space, and avoids having any messy hard-coded check constraints (or constraints using UDFs, or triggers validating the data).

That all said, there is no magic blanket answer that satisfies all designs. As your string gets larger, it can make more sense to use an integer instead of just storing the string. A counter-example would be storing all of the User Agent strings from your web logs - it makes a lot of sense to store the same string once and assign an integer to it, than to store the same 255-character string over and over and over again.

Other things that can make this design troublesome:

What if you expand beyond the US later?
Forget about state abbreviations for a moment (which are pretty static); what if your lookups are things that do change frequently?

Thanks, while I thought the States example would help to confine the problem I really am thinking a bit wider with instances that are like enums ... They should not change, but could be added to. I know I didn't mention that, but was still hoping for some direction. Maintaining the trigger would be an issue in this case I think. — user2821163, Oct 08 '13 at 22:53

score 2 · Answer 2 · edited May 23 '17 at 12:28

2

As a general rule (which may not apply in every single case), it's better to use integers as primary keys for performance reasons. So if your unique key is a string, create an autoincrement primary key.

Also, states don't have to be necessarily unique. It's true in one country but when you look at all countries in the world, same abbreviations may happen.

EDIT

I can't find a very good evidence of string vs. integer performance but take a look e.g. in here: Strings as Primary Keys in SQL Database

Having said that, there's never a lot of states so performance gain will be small in this case.

edited May 23 '17 at 12:28

Community

1
1

answered Sep 26 '13 at 20:54

Szymon

42,577
16
96
114

So using a 4-byte integer (stored *twice*) and an additional table is better than storing two characters? I don't think the blanket statement you're making applies to all cases. – Aaron Bertrand Sep 26 '13 at 20:56
Why an additional table? And sure, not all cases are the same, I'll add the words "in general". – Szymon Sep 26 '13 at 20:59
Maybe read the example: he's storing states in a table. Obviously those are going to be used in other tables, like an Orders table. So you would have Orders.StateID and then States.StateID and State.Abbreviation... – Aaron Bertrand Sep 26 '13 at 20:59
3

What evidence or documentation do you have that an integer primary will perform better that a two-character string? – D Stanley Sep 26 '13 at 21:00

D Stanley · Answer 3 · 2013-09-26T21:02:17.290

2

State Abbreviation is a rare example of a good non-increment primary key for the following reasons:

They are small (2-character)
They don't change
The set of values is relatively static - new records are unlikely

Just because the natural key is unique doesn't make it a good candidate for the primary key.

Even real-world values that are unique (like SSN) may nod be good candidates if they are entered in by humans. For example, suppose someone enters in a bunch of related data for a person, then get a letter that the SSN is wrong - now you can't just update the primary key - you need to update all of the foreign keys as well!

edited Sep 26 '13 at 21:02

answered Sep 26 '13 at 20:59

D Stanley

149,601
11
178
240

I've been personally in this mess of using a state abbreviation before. It's nice when you have one country only. Then you go to another country and you end up with a duplicate. You never know how your system is going to evolve (in some cases only you can be sure that it's never going to evolve). – Szymon Sep 26 '13 at 21:04
SSN is not actually unique (there are real world duplicates) and it actually changes rarely, so SSN is not really a good candidate even if you did not have issues resulting from human entering the data incorrectly. Real world codes are quite rarely suitable, such as state code as correctly point out. If the US government tried to change state codes, chaos would reign. – Gary Walker Sep 26 '13 at 21:04
They don't change? They shouldn't change, they probably won't change. The risk of them changing is low. – Tony Hopkinson Sep 26 '13 at 21:22

Identity field and primary key in SQL Server when values are unique

3 Answers3