11

We are having a rather long discussion in our company about whether or not to put an autoincrement key on EVERY table in our database.

I can understand putting one on tables that would have a FK reference to, but I kind-of dislike putting such keys on each and every one of our tables, even though the keys would never be used.

Please help with pros and cons for putting autoincrement keys on every table apart from taking extra space and slowing everything a little bit (we have some tables with hundreds of millions of records).

Thanks

Brumbar
  • 113
  • 1
  • 1
  • 5
  • mentioning which RDBMS you're using would be useful.... – Mitch Wheat Jan 04 '10 at 03:21
  • 2
    Duplicate question: http://stackoverflow.com/questions/840162/should-each-and-every-table-have-a-primary-key – leepowers Jan 04 '10 at 03:22
  • 1
    @pygorex1: It's not a duplicate of that question. That question is about whether each table should have a PK. This question is about whether there should be an identity/autoincrement/autonumber. – Rob Farley Jan 04 '10 at 03:25
  • @Rob: Artificial (AKA surrogate) keys *are* the most common primary keys. – OMG Ponies Jan 04 '10 at 03:29
  • @OMG I'm not denying that at all. But Brumbar's asking whether he needs a surrogate key, not whether he needs a primary key. – Rob Farley Jan 04 '10 at 03:45
  • Duplicate of "in general, should every table in a database have an identity field to use as a PK?" http://stackoverflow.com/questions/1207983 – gbn Jan 04 '10 at 11:03
  • But this question also involved the issue of "should I put one on existing tables" – Rob Farley Jan 04 '10 at 11:47

9 Answers9

13

I'm assuming that almost all tables will have a primary key - and it's just a question of whether that key consists of one or more natural keys or a single auto-incrementing surrogate key. If you aren't using primary keys then you will generally get a lot of advantages of using them on almost all tables.

So, here are some pros & cons of surrogate keys. First off, the pros:

  • Most importantly: they allow the natural keys to change. Trivial example, a table of persons should have a primary key of person_id rather than last_name, first_name.
  • Read performance - very small indexes are faster to scan. However, this is only helpful if you're actually constraining your query by the surrogate key. So, good for lookup tables, not so good for primary tables.
  • Simplicity - if named appropriately, it makes the database easy to learn & use.
  • Capacity - if you're designing something like a data warehouse fact table - surrogate keys on your dimensions allow you to keep a very narrow fact table - which results in huge capacity improvements.

And cons:

  • They don't prevent duplicates of the natural values. So, you'll still usually want a unique constraint (index) on the logical key.
  • Write performance. With an extra index you're going to slow down inserts, updates and deletes that much more.
  • Simplicity - for small tables of data that almost never changes they are unnecessary. For example, if you need a list of countries you can use the ISO list of countries. It includes meaningful abbreviations. This is better than a surrogate key because it's both small and useful.

In general, surrogate keys are useful, just keep in mind the cons and don't hesitate to use natural keys when appropriate.

KenFar
  • 1,531
  • 1
  • 14
  • 25
6

You need primary keys on these tables. You just don't know it yet.

leepowers
  • 37,828
  • 23
  • 98
  • 129
  • This is possibly true -- you might not need to refer to these tables now, but in a few years, that might change. – Kaleb Brasee Jan 04 '10 at 03:36
  • 4
    The question is about autoincrement keys, not primary keys. There's a distinction. – Bill the Lizard Jan 04 '10 at 03:41
  • @Bill the Lizard: The question implies that these tables don't have primary keys ("I can understand putting one on tables that would have a FK reference to"). – Powerlord Jan 04 '10 at 15:04
  • @R. Bemrose: If you read the question like that, then it's an exact duplicate. For that matter, so is this answer. http://stackoverflow.com/questions/1771237/is-it-ok-not-to-use-a-primary-key-when-i-dont-need-one/1771299#1771299 – Bill the Lizard Jan 04 '10 at 22:30
5

If you use small keys like this for Clustered Indexes, then there's quite significant advantages.

Like:

Inserts will always go at the end of pages.

Non-Clustered Indexes (which need a reference to the CIX key(s)) won't have long row addresses to consider.

And more... Kimberly Tripp's stuff is the best resource for this. Google her...

Also - if you have nothing else ensuring uniqueness, you have a hook into each row that you wouldn't otherwise have. You should still put unique indexes on fields that should be unique, and use FKs onto appropriate fields.

But... please consider the overhead of creating such things on existing tables. It could be quite scary. You can put unique indexes on tables without needing to create extra fields. Those unique indexes can then be used for FKs.

Rob Farley
  • 15,625
  • 5
  • 44
  • 58
  • I'm assuming SQL Server here, but on the whole, the principles will be similar whichever system you're talking about. – Rob Farley Jan 04 '10 at 03:22
3

I'm not a fan of auto-increment primary keys on every table. The ideas that these give you fast joins and fast row inserts are really not true. My company calls this meatloaf thinking after the story about the woman who always cut the ends off her meatloaf just because her mother always did it. Her mother only did it because the pan was too short--the tradition keeps going even though the reason no longer exists.

  • When the driving table in a join has an auto-increment key, the joined table frequently shouldn't because it must have the FK to the driving table. It's the same column type, but not auto-increment. You can use the FK as the PK or part of a composite PK.

  • Adding an auto-increment key to a table with a naturally unique key will not always speed things up--how can it? You are adding more work by maintaining an extra index. If you never use the auto-increment key, this is completely wasted effort.

  • It's very difficult to predict optimizer performance--and impossible to predict future performance. On some databases, compressed or clustered indexes will decrease the costs of naturally unique PKs. On some parallel databases, auto-increment keys are negotiated between nodes and that increases the cost of auto-increment. You can only find out by profiling, and it really sucks to have to change Company Policy just to change how you create a table.

Ken Fox
  • 1,649
  • 8
  • 12
  • Well. I wouldn't use the FK as a PK. Then, it would no longer be an FK. I would use an auto increment PK on the driving table & the secondary table. Numeric PKs index faster even if there is a candidate for a natural varchar key, already present. Auto increments maintain better data integrity. Auto increments also decouple the key from any business logic, which makes the schema more adaptable. – Charles Robertson Apr 18 '15 at 21:23
  • FKs can certainly be PKs too. See http://stackoverflow.com/questions/17636106/can-a-foreign-key-act-as-a-primary-key. Numeric PKs may not index faster--that is a DB implementation detail. Auto increments do not maintain data integrity better than other kinds of keys; when you have auto increment PKs you'll almost always need additional constraints. I agree they can decouple the key from business rules so it can give you a way out when the business suddenly decides a primary key needs to be editable. It's just not worth adopting "always use auto increment" for that bonus. – Ken Fox May 01 '15 at 01:45
  • What I mean is that a PK needs to be unique. A PK & an FK can be used to join a table, so you could have: Correct: tblUser: PK User_ID auto increment tblTeacher PK Teacher_ID auto increment FK User_ID Incorrect: tblUser: PK User_ID auto increment tblTeacher PK Teacher_ID auto increment FK User_ID auto increment – Charles Robertson May 02 '15 at 07:08
  • An auto increment ensures that a PK is not only unique in the table's current state but throughout the history of that table. This prevents key recycling. – Charles Robertson May 02 '15 at 07:35
2

Having autoincrementing primary keys may make it easier for you to switch ORM layers in the future, and doesn't cost much (assuming you retain your logical unique keys).

GWLlosa
  • 23,995
  • 17
  • 79
  • 116
  • Why would you switch ORMs if you decided on one to abstract your database away? Surely you use ORMs to hide the database, so why not use stored procs if you're going to chabge ORMs... – gbn Jan 04 '10 at 20:43
  • 1
    If you were to write an additional front end on a different platform (say, on a smartphone) that did not have your previous ORM available as a choice. – GWLlosa Jan 06 '10 at 16:28
1

You add surrogate auto increment primary keys as part of the implementation after logical design to respect the physical, on-disk architecture of the db engine.

That is, they have physcial properties (narrow, numeric, strictly monotonically increasing) that suit use as clustered keys, in joins etc.

Example: If you're modelling your data, then "product SKU" is your key. "product ID" is added afterwards, (with a unique constraint on "product SKU") when writing your "CREATE TABLE" statements because you know SQL Server.

This is the main reason.

The other reason a brain dead ORM that can't work without one...

gbn
  • 422,506
  • 82
  • 585
  • 676
1

Many tables are better off with a compound PK, composed of two or more FKs. These tables correspond to relationships in the Entity-Relationship (ER) model. The ER model is useful for conceptualizing a schema and understanding the requirements, but it should not be confused with a database design.

The tables that represent entities from an ER model should have a smiple PK. You use a surrogate PK when none of the natural keys can be trusted. The decision about whether a key can be trusted or not is not a technical decision. It depends on the data you are going to be given, and what you are expected to do with it.

If you use a surrogate key that's autoincremented, you now have to make sure that duplicate references to the same entity don't creep into your databases. These duplicates would show up as two or more rows with a distinct PK (because it's been autoincremented), but otherwise duplicates of each other.

If you let duplicates into your database, eventually your use of the data is going to be a mess.

Walter Mitty
  • 18,205
  • 2
  • 28
  • 58
0

The simplest approach is to always use surrogate keys that are either auto-incremented by the db or via an orm. And on every table. This is because they are the generally fasted method for joins and also they make learning the database extremely simple, i.e. none of this whats my key for a table nonsense as they all use the same kind of key. Yes they can be slower but in truth the most important part of design is something that wont break over time. This is proven for surrogate keys. Remember, maintenance of the system happens a lot longer than development. Plan for a system that can be maintained. Also, with current hardware the potential performance loss is really negligable.

Bruce
  • 1
0

Consider this:

A record is deleted in one table that has a relationship with another table. The corresponding record in the second table cannot be deleted for auditing reasons. This record becomes orphaned from the first table. If a new record is inserted into the first table, and a sequential primary key is used, this record is now linked to the orphan. Obviously, this is bad. By using an auto incremented PK, an id that has never been used before is always guaranteed. This means that orphans remain orphans, which is correct.

I would never use natural keys as a PK. A numeric PK, like an auto increment is the ideal choice the majority of the time, because it can be indexed efficiently. Auto increments are guaranteed to be unique, even when records are deleted, creating trusted data relationships.

Charles Robertson
  • 1,760
  • 16
  • 21