4

If a database has attributes A1, A2, A3...An and A1, A2 & A3 can form composite key together, is it better to use a surrogate key instead of a composite key?

Using a surrogate key will improve the Insertion execution speed of records (this supports surrogate over composite key) But the SELECT, UPDATE and DELETE queries based on the attributes A1, A2 & A3 will be tremendously slowed down if we use surrogate key(this supports composite key over surrogate key).

Which is better in terms of performance given such conditions? Surrogate key or composite key?

Adithya Upadhya
  • 2,239
  • 20
  • 28
  • 3
    This is an ongoing religious war. You will never find a winner. I leave you with this: It will not improve the speed of insertion, it will slow it down. Just because you have a surrogate key, your A1, A2 & A3 are still your actual primary key. Therefore you need a unique constraint on those also meaning there's likely to be an index in there somewhere. You now have two indexes where you should have one. What it does do is mean you propagate less data (i.e. A1, A2, A3) over your related tables. That would make them bigger but it also can lead to fewer joins. So as usual, "it depends". – LoztInSpace May 24 '15 at 04:47
  • @ LoztInSpace, the insertion speed will be greatly improved if we use a surrogate key such as **`ID PRIMARY KEY AUTO_INCREMENT`** instead of **PRIMARY KEY(A1, A2, A3)** since the B-Tree indexing will be much faster if we use a light weight primary key with auto increment. Hence, Surrogate keys usually increase insertion execution speed. – Adithya Upadhya May 24 '15 at 05:21
  • 2
    Even if the insert itself is faster for a surrogate than the 3 column insert (and I am not necessarily agreeing with that claim) you have overlooked a key point: A surrogate is exactly that. It's a placeholder for the real thing. You still need to enforce uniqueness in your actual, true primary key (A1, A2, A3). So you are in fact required to do *more* work by introducing a surrogate key. If you choose to not enforce your actual key then you've really just thrown your data model away and you no longer have two things that you can meaningfully compare. Make sense? – LoztInSpace May 24 '15 at 07:38

2 Answers2

3

In nearly all tests, there was little to no performance advantage of surrogate keys over natural keys. Natural keys also have the advantage of being much easier to work with. A better write-up is available here.

StuartLC
  • 104,537
  • 17
  • 209
  • 285
  • Interesting reference. – Jonathan Leffler May 24 '15 at 07:54
  • The link in this answer sends me to a page "How Using the Windows 2000 Encrypted File System to Secure SQL Server Databases and Backups Affects SQL Server's Performance" ... doesn't seem to have anything to do with a discussion of surrogate vs natural keys in MySQL. – spencer7593 Oct 09 '20 at 14:04
3

Performance is not the primary concern of choosing whether to implement a surrogate primary key.

We find that the ideal primary key has several desirable attributes

  • simple (single column, native datatype)
  • unique (positively NO duplicate values)
  • non null (every row will have a value)
  • immutable (once assigned it is never changed)
  • anonymous (carries no "information")

There's no "rule" that says that the candidate key selected as the primary key must have all of these properties, but these are properties that are desirable, for various reasons.

There's not even a "rule" that says all tables need to have a primary key. But we find it desirable that they do.

Successful software systems have been built using surrogate keys as well as natural keys.


In terms of performance, there's not really that much of a difference that can be demonstrated. But do consider this: if an entity table has a primary key that is a composite key that consists of several "large" columns, those same large columns have to be repeated in any table that has a foreign key reference to that entity table, and in some storage engines (InnoDB), those get repeated in every index.

But performance is not really the deciding factor. (Anyone that suggests that performance should be the deciding factor in selecting a candidate key as the primary key hasn't really thought about it enough.)


As far as being "easier to work with", many developers find it easier to use a single column as the primary key vs. a composite key consisting of two, three, or more columns.

Some developers that have opted for natural keys as primary key have later been burned by their selection of a candidate key. Not because it was a natural key, but because further along in development, "new" requirements were "discovered", and it turned out that the candidate key they had selected as the primary key wasn't really always unique, or that it wasn't exempt from being changed, or that it wasn't really anonymous.

There are lots of software projects that have been successful using natural keys, and composite keys as PRIMARY KEY. Just as there's been success using surrogate key as PRIMARY KEY.

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • I think the 'R' in RDBMS implies the PK requirement – Strawberry May 24 '15 at 08:26
  • @Strawberry: It's entirely possible to issue a `CREATE TABLE` statement that does not include a PRIMARY KEY constraint. The table doesn't require a UNIQUE constraint, or even a NOT NULL constraint, Yes, we find it *desirable* to add a primary key. But strictly speaking, it's not a requirement. – spencer7593 May 24 '15 at 20:18
  • Keys (at least one per table) are strictly required by definition of a relation and must therefore always be present in a relational database. Keys are not required in SQL according to the SQL standard. If a SQL table lacks a non-nullable and and irreducibly unique set of columns then it isn't a relational table. – nvogel May 26 '15 at 19:56
  • 1
    (a) There are many, specific, rules (2) The **R** in RDBMS demands a Logical Relational PK, one that is "made up from the data". Surrogates such as GUIDS, ID, etc are **not** made up from the data. They are Physical pointers (3) Surrogates break the **Access Path Independence** Rule in the *Relational Model*. (4) No such thing as a "surrogate key", it does not have any of the qualities of a Key (5) The Key cannot be abandoned, because a surrogate does not provide the row uniqueness. (6) It is not an either/or decision, the surrogate is always an additional field. – PerformanceDBA Jun 06 '15 at 02:50