5

Although I'm guilty of this crime, it seems to me there can't be any good reason for a table to not have an identity field primary key.

Pros: - whether you want to or not, you can now uniquely identify every row in your table which previously you could not do - you can't do sql replication without a primary key on your table

Cons: - an extra 32 bits for each row of your table

Consider for example the case where you need to store user settings in a table in your database. You have a column for the setting name and a column for the setting value. No primary key is necessary, but having an integer identity column and using it as your primary key seems like a best practice for any table you ever create.

Are there other reasons besides size that every table shouldn't just have an integer identity field?

Michael Pryor
  • 25,046
  • 18
  • 72
  • 90

11 Answers11

14

Sure, an example in a single-database solution is if you have a table of countries, it probably makes more sense to use the ISO 3166-1-alpha-2 country code as the primary key as this is an international standard, and makes queries much more readable (e.g. CountryCode = 'GB' as opposed to CountryCode = 28). A similar argument could be applied to ISO 4217 currency codes.

In a SQL Server database solution using replication, a UNIQUEIDENTIFIER key would make more sense as GUIDs are required for some types of replication (and also make it much easier to avoid key conflicts if there are multiple source databases!).

Greg Beech
  • 133,383
  • 43
  • 204
  • 250
  • You're right, this is a good reason. For me, in this case, I will still use an autoincrementing PK and make a unique constraint on the country code. Also, you can have a non-PK GUID for replication. Just adding some thoughts. :) – BobbyShaftoe Mar 22 '09 at 23:20
  • @Bobby - We originally started with an INT PK on our Country table, but it got to be more pain than using the country code so got refactored out. And yep, you're right, the ROWGUIDCOL doesn't have to be the PK but IMHO it normally makes sense to make it so. – Greg Beech Mar 22 '09 at 23:34
  • I awe about about to declare that you should always use PKs, but I like your counter examples. These are basically external identifiers that will never be modified, so are valid as their own PKs. – Jason Peacock Mar 23 '09 at 00:01
  • I don't think readability of queries is a particularly good reason personally. While the ISO standard codes are probably as good an example as you will get, it's still more likely that the ISO code for a country will change than a surrogate key will change. – Draemon Dec 11 '09 at 14:00
10

The most clear example of a table that doesn't need a surrogate key is a many-to-many relation:

CREATE TABLE Authorship (
  author_id INT NOT NULL,
  book_id   INT NOT NULL,
  PRIMARY KEY (author_id, book_id),
  FOREIGN KEY (author_id) REFERENCES Authors (author_id),
  FOREIGN KEY (book_id) REFERENCES Books (book_id)
);

I also prefer a natural key when I design a tagging system:

CREATE TABLE Tags (
  tag VARCHAR(20) PRIMARY KEY
);

CREATE TABLE ArticlesTagged (
  article_id INT NOT NULL,
  tag        VARCHAR(20) NOT NULL,
  PRIMARY KEY (article_id, tag),
  FOREIGN KEY (article_id) REFERENCES Articles (article_id),
  FOREIGN KEY (tag) REFERENCES Tags (tag)
);

This has some advantages over using a surrogate "tag_id" key:

  • You can ensure tags are unique, without adding a superfluous UNIQUE constraint.
  • You prevent two distinct tags from having the exact same spelling.
  • Dependent tables that reference the tag already have the tag text; they don't need to join to Tags to get the text.
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • given this layout, are db's smart enough to only store the text data once? – Michael Pryor Mar 24 '09 at 01:49
  • No, typically the string would be stored once in the Tags table and once in the ArticlesTagged table. You wouldn't choose this design if storage was a premium, you'd choose this design for the other advantages I mentioned. – Bill Karwin Mar 24 '09 at 01:51
  • Why have the Tags table at all? – Draemon Dec 11 '09 at 14:02
  • @Draemon: As a lookup table for permitted tags. You might not necessarily have articles that use every tag, but you still want to list the tags allowed, for instance to populate a menu in your user interface. Plus it's faster to `SELECT Tag FROM Tags` than `SELECT DISTINCT Tag FROM ArticlesTagged`. – Bill Karwin Dec 11 '09 at 17:33
7

Every table should have a primary key. It doesn't matter if it's an integer, GUID, or the "setting name" column. The type depends on the requirements of the application. Ideally, if you are going to join the table to another, it would be best to use a GUID or integer as your primary key.

Jesse Weigert
  • 4,714
  • 5
  • 28
  • 37
4

Yes, there are good reasons. You can have semantically meaningful true keys, rather than articificial identity keys. Also, it is not a good idea to have a seperate autoincrementing primary key for a Many-Many table. There are some reasons you might want to choose a GUID.

That being said, I typically use autoincrementing 64bit integers for primary keys.

BobbyShaftoe
  • 28,337
  • 7
  • 52
  • 74
4

Every table should have a primary key. But it doesn't need to be a single field identifier. Take for example in a finance system, you may have the primary key on a journal table being the Journal ID and Line No. This will produce a unique combination for each row (and the Journal ID will be a primary key in its own table)

Your primary key needs to be defined on how you are going to link the table to other tables.

Craig T
  • 2,761
  • 5
  • 25
  • 33
2

I don't think every table needs a primary key. Sometimes you only want to "connect" the contents of two tables - via their primary key.

So you have a table like users and one table like groups (each with primary keys) and you have a third table called users_groups with only two colums (user and group) where users and groups are connected with each other.

For example a row with user = 3 and group = 6 would link the user with primary key 3 to the group with primary key 6.

Koraktor
  • 41,357
  • 10
  • 69
  • 99
2

One reason not to have primary key defined as identity is having primary key defined as GUIDs or populated with externally generated values.

In general, every table that is semantically meaningful by itself should have primary key and such key should have no semantic meaning. A join table that realizes many-to-many relationship is not meaningful by itself and so it doesn't need such primary key (it already has one via its values).

topchef
  • 19,091
  • 9
  • 63
  • 102
  • You mean, doesn't need an additional surrogate PK column? It should have a primary key constraint, comprising the keys of the other 2 tables. – Tony Andrews Mar 23 '09 at 11:39
1

The key difference (sorry) between a natural primary key and a surrogate primary key is that the value of the natural key contains information whereas the value of a surrogate key doesn't.

Why is this important? Well a natural primary key is by definition guaranteed to be unique, but its value is not usually guaranteed to stay the same. When it changes, you have to update it in multiple places.

A surrogate key's value has no real meaning and simply serves to identify that row, so it never needs to be changed. It is a feature of the model rather than the domain itself.

So the only place I would say a surrogate key isn't appropriate is in an association table which only contains columns referring to rows in other tables (most many-to-many relations). The only information this table carries is the association between two (or more) rows, and it already consists solely of surrogate key values. In this case I would choose a composite primary key.

If such a table had bag semantics, or carried additional information about the association, I would add a surrogate key.

Draemon
  • 33,955
  • 16
  • 77
  • 104
  • You don't have to update in multiple places when the primary key changes. That's what foreign keys and relations are for. There's no requirement that a primary key should never change and if changing a primary key is causing problems you should revisit your database design and fix it. – Lars Hansson Jan 12 '14 at 10:18
  • Yes you do need to update the foreign keys as well as the primary key. – Draemon Jan 14 '14 at 00:27
  • No, you really don't. When the primary key changes all foreign keys using this primary key also changes (if you have cascade on change, which you almost always should). – Lars Hansson Jan 18 '14 at 08:56
1

To be a properly normalised table, each row should only have a single identifiable key. Many tables will already have natural keys, such a unique invoice number. I agree, especially with storage being so cheap, there is little overhead in having an autonumber/identity key on all tables, but in this instance which is the real key.

Another area where I personally don't use this approach if for reference data, where typically we have a Description and a Value

Code, Description
'L', 'Live'
'O', 'Old'
'P', 'Pending'

In this situation making code a primary key ensures no duplicates, and is more human readable.

MrTelly
  • 14,657
  • 1
  • 48
  • 81
0

A primary key is ALWAYS a good idea. It allows for very fast and easy joining of tables. It aides external tools that can read system tables to make join allowing less skilled people to create their own queries by drag-and-drop. It also makes the implementation of referential integrity a breeze and that is a good idea from the get go.

0

I know for sure that some very smart people working for web giants do this. While I don't know why their own reasons, I know 2 cases where PK-less tables make sense:

  • Importing data. The table is temporary. Insertions and whole table scans need to be as fast as possible. Also, we need to accept duplicate records. Later we will clean the data, but the import process needs to work.
  • Analytics in a DBMS. Identifying a row is not useful - if we need to do it, it is not analytics. We just need a non-relational, redundant, horrible blob that looks like a table. We will build summary tables or materialized views by writing proper SQL queries.

Note that these cases have good reasons to be non-relational. But normally your tables should be relational, so... yes, they need a primary key.

Federico Razzoli
  • 4,901
  • 1
  • 19
  • 21