A key in a database table, that has no intrinsic logical meaning, and was introduced for better physical organization of the database or other technical reasons.
Terminology
- A "superkey" is any set of attributes that, when taken together, uniquely identify rows in the table.
- A minimal1 superkey is called "candidate key", or just "key".
- A "natural key" is a key that is comprised from attributes that have intrinsic meaning and would be present in the data model even if there was no natural key on top of them.
- A "surrogate key", on the other hand, has no intrinsic meaning and is introduced purely for technical reasons, as discussed below.
Properties of Surrogate Keys
If there is a natural key in the model, it cannot be replaced with a surrogate key. A surrogate key can only be created in addition to any existing natural keys.
So the engineering decision is not on surrogate versus natural key, but on surrogate + natural key versus the natural key alone.
Having a surrogate key + natural key:
- May make the FOREIGN KEYs in child tables slimmer.2
- The surrogate never needs to change, and therefore never incurs ON UPDATE CASCADE referential action.
- May be more friendly to object-relational mapping (ORM) tools.
Having only a natural key:
- Makes the parent table slimmer.3
- May play better with clustering.4
- May make JOINs unnecessary in some situations.5
May be needed for correctly modeling certain kinds of diamond-shaped dependencies. For example, the following model guarantees that if B and C are connected to the same D, they must also be connected to the same A:
Note how A_ID gets propagated from the "diamond" top, down both sides and then is merged at the bottom.
Typical Implementations of Surrogate Keys
Most commonly, a surrogate key is implemented as an auto-incremented integer. Examples:
- Oracle supports the SEQUENCE object, that can be used either directly in INSERT statement, or through an ON INSERT trigger.
- MS SQL Server has the IDENTITY data type, and from the 2012 version, the explicit SEQUENCE object as well.
- PostgreSQL supports the explicit SEQUENCE object, as well as the serial types that use sequences implicitly.
- MySQL has the AUTO_INCREMENT attribute.
GUIDs or UUIDs are sometimes used when uniqueness must be guaranteed without a central "generator" for surrogate key values, such as in certain "disconnected" or replication scenarios.
1 That is, a superkey that would stop being unique (and therefore, being a superkey) if any of the attributes were removed from it.
2 Surrogates tend to use "slimmer" data types such as integers, versus "fatter" types such as strings that are often used in natural keys. Also, while it is not unusual for a natural key to be composite, there is almost never a reason to make a composite surrogate key. As a consequence, a FOREIGN KEY referencing surrogate key tends to be slimmer than a FK referencing natural key.
3 There is no need for the additional index "underneath" the surrogate key. Each new index incurs a maintenance cost for INSERT/UPDATE/DELETE operations and may be especially costly in clustered tables, where secondary indexes must typically contain the copy of the clustering key (which is often same as primary key) and may incur a double-lookup during querying.
4 It is often necessary to query the data on a range that is "aligned" with the natural key. Using the natural key as a clustering key physically orders the data in the table, which can dramatically diminish the I/O under some circumstances. On the other hand, surrogate keys typically do not poses an ordering that would be useful for clustering.
5 We can fetch the migrated natural key directly from the child table, without JOINing with the parent.