58

I'm assigned to migrate a database to a mid-class ERP. The new system uses composite primary keys here and there, and from a pragmatic point of view, why?

Compared to autogenerated IDs, I can only see negative aspects;

  • Foreign keys becomes blurry
  • Harder migration or db-redesigns
  • Inflexible as business change. (My car has no reg.plate..)
  • Same integrity better achieved with constraints.

It's falling back to the design concept of candiate keys, which I neither see the point of.

Is it a habit/artifact from the floppy-days (minimizing space/indexes), or am I missing something?

//edit// Just found good SO-post: Composite primary keys versus unique object ID field //

Community
  • 1
  • 1
Teson
  • 6,644
  • 8
  • 46
  • 69
  • 2
    Because when you migrate a database that use composite keys to a mid-class ERP, you still need a database that support composite keys? /sarcasm – Pierre-Alain Vigeant Mar 23 '11 at 13:43
  • 13
    @dportas: what is the point of candidate keys? – Quassnoi Mar 23 '11 at 14:11
  • 3
    The point of *enforcing* a candidate key is to ensure the uniqueness of the attributes that are required to be unique by preventing duplicate data from entering into the system. The point of designating and using candidate keys at all is: to provide identifiers by which users can correctly identify the data of interest to them; to help ensure that the data will accurately correspond to the domain of discourse; to implement business rules. These are obviously related and complementary reasons that together mean basically the same thing: Data Integrity. – nvogel Mar 23 '11 at 14:45
  • 6
    You can enforce a column to be unique without having to tie it to the primary key. Keep the primary key an autogenerated ID as suggested in the question and just make sure columns that need to have unique values enforce that rule. – Justin C Mar 23 '11 at 14:57
  • 4
    @dportas: are you sure you understand correctly what a candidate key is? – Quassnoi Mar 23 '11 at 14:57
  • @Quassnoi, I'm sure that a candidate key is a minimal superkey, which is a set of attributes that are irreducibly unique within a table. If you disagree with something I wrote then maybe you could explain what. – nvogel Mar 23 '11 at 15:03
  • 1
    @dportas: from what you explained I just can't see what the "point of candidate key" is. – Quassnoi Mar 23 '11 at 15:06
  • Sorry @Quassnoi, in the space avauilable here I don't know how to explain keys and data integrity any better than I did. Maybe you should open a new question. – nvogel Mar 23 '11 at 15:26
  • 1
    @dportas: I know what a candidate key is, and know it has no "point" and cannot be "designated". I just asked because your original statement was, in my opinion, a little bit too patronizing. – Quassnoi Mar 23 '11 at 15:31
  • 3
    @dportas - condescending to Quassnoi about SQL probably won't get you very far... – JNK Mar 23 '11 at 15:43
  • 7
    @JNK: condescending to `$person` about `$subject` probably won't get `$condescendent` very far (regardless of relative knowledge level) – Quassnoi Mar 23 '11 at 15:51
  • @Quassnoi - good point as well :) – JNK Mar 23 '11 at 15:52
  • @Quassnoi, I understood "point" in this case to be a synonym for "purpose". The practical purpose of keys is to act as identifiers and be used in the ways I described. Designating the keys is part of the process of database design and especially normalization. "Designate" is the word used by Codd, David Maier and others and I can't think of a better one. I'm not really sure what you are disagreeing with, if anything. – nvogel Mar 23 '11 at 15:54
  • 2
    @dportas: we have an empty table with the following fields: `(a, b, c, d)`. Please list its candidate keys and their purposes. – Quassnoi Mar 23 '11 at 15:57
  • 3
    @Quassnoi, why? In a relation the candidate keys are the minimal superkeys. The only possible candidate key of an empty relation is {} - the empty set. In a relation variable however the candidate keys are those which must hold for *every* *possible* relation value, not just the current state. Therefore the candidate keys of a relvar can only be identified with reference to the domain constraints, dependencies and other rules that the relvar is supposed to satisfy. Since you haven't specified anything about the requirements of your example there is no general answer that I can give. – nvogel Mar 23 '11 at 16:05
  • 2
    If you think you don't need composite keys, you have a badly designed database. – gbn Mar 23 '11 at 20:19
  • @gbn, I presume you mean "need the capability for composite keys" as opposed to simply "needing" them as a threshold requirement of a non-trivial database. If the latter, there are way too many non-trivial databases running around. – jro Mar 23 '11 at 22:26
  • @dportas: of course you can't, that's what I wanted to demonstrate. It's the model (or data) which defines the candidate key, not vise versa. A candidate key is just a set of columns defined by constraints (in case of a relvar) or by data (in case of a relation). If I decided to run `5` km each morning, there is no purpose in `5`, the purpose is in my decision. – Quassnoi Mar 23 '11 at 23:27
  • @Quassnoi, therefore by implementing those constraints you will implement candidate keys and so before doing that you should decide which candidate keys you want to implement. The purpose of the candidate keys you choose to implement is data integrity. That was and is my answer to the OP's question. The OP appears to think candidate keys aren't needed at all and wants to know the purpose of having them. I think we have both answered that question adequately. – nvogel Mar 23 '11 at 23:36
  • @sqlvogel, it's amusing to read your assumptions about me and my knowledges. – Teson Mar 03 '12 at 23:56
  • @user247245, what assumptions? You stated that you don't see the point of composite keys; I explained that the point of them is data integrity. – nvogel May 25 '12 at 12:08

9 Answers9

57

Composite keys are required when your primary keys are non-surrogate and inherently, um, composite, that is, breakable into several non-related parts.

Some real-world examples:

  • Many-to-many link tables, in which the primary keys are composed of the keys of the entities related.

  • Multi-tenant applications when tenant_id is a part of primary key of each entity and the entities are only linkable within the same tenant (constrained by a foreign key).

  • Applications processing third-party data (with already provided primary keys)

Note that logically, all this can be achieved using a UNIQUE constraint (additional to a surrogate PRIMARY KEY).

However, there are some implementation specific things:

  • Some systems won't let a FOREIGN KEY refer to anything that is not a PRIMARY KEY.

  • Some systems would only cluster a table on a PRIMARY KEY, hence making the composite the PRIMARY KEY would improve performance of the queries joining on the composite.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • @quassnoi, which systems won't let FOREIGN KEY constraints refer to anything but a PRIMARY KEY constraint? Which systems will only cluster on a PRIMARY KEY constraint? I don't know of any DBMS which has such limitations. – nvogel Mar 23 '11 at 19:51
  • @Quassnoi - Great answer. could you explain a bit more about how the multi-tenancy situation works? – orokusaki Mar 23 '11 at 20:47
  • @dportas: `Paradox` did not allow referencing anything other than a `PRIMARY KEY`; `MySQL` with `InnoDB` and `Oracle` only cluster tables on a `PRIMARY KEY` (note that `Oracle` calls it an *index-organized table* and means completely other thing by what it calls `CLUSTER`) – Quassnoi Mar 23 '11 at 23:17
  • @orokusaki: if you have multi-tenant `posts` and `tags`, without `tenant_id` being a part of a referenceable key (`PRIMARY` or `UNIQUE`), then an entry in `posts_tags` can reference `post` and `tag` from different tenants. To handle it correctly, you should make `tenant_id` a part of a key in both tables, add it to `posts_tags` and make it a part of the two `FOREIGN KEY` constraints to appropriate tables. – Quassnoi Mar 23 '11 at 23:34
  • @Quassnoi - Ah, that's excellent, thanks. Django (which I use) doesn't provide composite primary key support. Do you have any experience with Python regarding this? I would love to use that instead of all the boiler plate code I have to use to guarantee that tenant A can't ever access tenant B's rows (which feels wrong, and certainly adds overhead). – orokusaki Mar 24 '11 at 16:09
  • @orokusaki: `Python` is not an `RDBMS` so I don't quite get which kind of experience you are asking about. As for `Django`, all modern systems support references to any `UNIQUE` sets, not only primary keys, so you can use surrogate primary keys for `Django` and composite unique keys (including `tenant_id`) for constraints. – Quassnoi Mar 24 '11 at 16:18
  • @Quassnoi - I meant like, SQLAlchemy or ORMs in Python, sorry. – orokusaki Mar 24 '11 at 16:20
  • @Quassnoi, regarding comment #4 (@orokusaki), as system grows you pay penalty for implementing tennant_id as part of key. Every related entity has to be tennant-aware, instead of just postID-aware. Faster - I agree - but ram i cheap. – Teson Mar 24 '11 at 19:23
  • @Quassnoi, thank you for a well laid out answer. (First point is obvious and should have been included my OP :-|) Bullet 2 & 3 are excellent examples for good usage from my POV as well. – Teson Mar 24 '11 at 19:29
41

Personally I prefer the use of surrogate keys. However, in joining tables that consist only of the ids from two other tables (to create a many-to-many relationships) composite keys are the way to go and thus taking them out would make things more difficult.

There is a school of thought that surrogate keys are always bad and that if you don't have uniqueness to record through the use of natural keys you have a bad design. I strongly disagree with this (if you aren't storing SSN or some other unique value I defy you to come up with a natural key for a person table for instance.) But many people feel that it is necessary for proper normalization.

Sometimes having a composite key reduces the need to join to another table. Sometimes it doesn't. So there are times when a composite key can boost performance as well as times when it can harm performance. If the key is relatively stable, you may be fine with faster performance on select queries. However, if it is something that is subject to change like a company name, you could be in a world of hurt when company A changes it's name and you have to update a million associated records.

There is no one size fits all in database design. There are time when composite keys are helpful and times when they are horrible. There are times when surrogate keys are helpful and times when they are not.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 17
    **Very important note:** Social security numbers are not unique and are therefore not a good candidate. http://blogs.computerworld.com/node/5969 – BoffinBrain Mar 23 '11 at 14:44
  • 3
    @BoffinbraiN: a candidate key cannot be good or bad, it can either exist or not. If there are two or more same `SSN` in a table, then `SSN` is not a candidate key, if not, then it is. – Quassnoi Mar 23 '11 at 15:08
  • 7
    Understood. I'll be a bit clearer: even if your table contains no duplicate SSNs, **it must still never be considered a candidate**, otherwise you might get punished in the future (as explained in the article). – BoffinBrain Mar 23 '11 at 15:41
  • 2
    @BoffinbraiN: Whether or not something is a candidate key is not the presence of two identical values (or tuples) in the same table (that implies that you're doing your database design after the data's already there!), but rather whether or not the possible set of data (for the sake of discussion, we'll say it's all people) could produce duplicate values. In this regard, because SSN's are not guaranteed to be unique across all possible people, SSN is not a candidate key. – Adam Robinson Mar 23 '11 at 17:38
  • I completely agree with you once again. ;) If someone's going to make the mistake of turning SSN into a primary key etc., then they will do it at design-time. – BoffinBrain Mar 23 '11 at 17:47
  • @BoffinbraiN: I think your comment was probably directed at @Quassnoi, but I'll take the sentiment nonetheless ;) – Adam Robinson Mar 23 '11 at 17:51
  • 1
    Not only are SSNs not guaranteed to be unique across all possible people, there are application domains (e.g. criminal justice) where it's not at all uncommon for a person to have more than one. – Robert Rossney Mar 23 '11 at 18:35
  • 2
    Agreed, +1 for the balanced answer. The antipattern is not using surrogate keys, nor is it using composite keys. The antipattern is blindly using one solution or the other for every situation. – Bill Karwin Dec 14 '12 at 21:20
40

Composite primary key provides better performance when it comes to them being used as Foreign keys in other tables and reduces table reads - sometimes they can be life savers. If you use surrogate keys, you have to go to that table to get natural key information.

For example (pure example - so we are not talking DB design here), lets say you have an ORDER table and ORDER_ITEM. If you use ProductId and LineNumber (UPDATE: and as Pedro mentioned OrderId or even better OrderNumber) as composite primary key in ORDER_ITEM, then in your cross table for SHIPPING, you would be able to have ProductId in the SHIPPING_ORDERITEM. This can massively boost your performance if for example you have run out of that product and need to find out all products of that ProductId that need to be shipped without a need to join.

On the other hand, if you use a surrogate key, you have to join and you end up with a very inefficient SQL execution plan where it has to do bookmark lookup on several indexes.

See more on bookmark lookup which using surrogate keys becomes a major issue.

gdoron
  • 147,333
  • 58
  • 291
  • 367
Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • In what table would ProductId and LineNumber be the composite key? – Pedro Mar 23 '11 at 13:55
  • What then, do you do for another order of the same product? – Pedro Mar 23 '11 at 13:58
  • 1
    Of course you would have OrderId in there as well, as I said I am not talking database design here. But thanks I would add the OrderId as well. – Aliostad Mar 23 '11 at 14:01
  • I denormalize and put productid in shipping_orderitem. Wouldn't this even be faster, and also history surviving changes in product-table? – Teson Mar 24 '11 at 19:39
9

Natural primary keys are brittle.

Suppose we have built a system around a natural PK on (CountryCode, PhoneNumber), and several years down the road we need to add Extension, or change the PK to one column: Email. If these PK columns are propagated to all child tables, this becomes very expensive.

A few years ago there were some systems that were built assuming that Social Security Number is a natural PK, and had to be redesigned to use identities, when the SSN became non-unique and nullable.

Because we cannot predict the future, we don't know if later on some change will render obsolete what used to be a perfectly correct and complete model.

A-K
  • 16,804
  • 8
  • 54
  • 74
  • 4
    Natural keys based upon incorrect or incomplete modeling are brittle. FTFY. – Adam Robinson Mar 23 '11 at 17:40
  • 6
    Any approach contingent on our understanding being "correct" or "complete" is brittle. FTFY. – Robert Rossney Mar 23 '11 at 18:19
  • 5
    Your argument is basically natural keys that are not, in actuality, natural keys are not good natural keys. Tautology for the win? – jdmichal Mar 23 '11 at 18:47
  • @Adam Robinson: because we cannot predict the future, we don't know if some change will render obsolete what used to be a perfectly correct and complete model – A-K Mar 23 '11 at 21:45
  • 1
    @jdmichal long term yes, "natural keys are not good natural keys" – A-K Mar 23 '11 at 21:48
  • @Alex, while I somewhat agree with you, I would be curious if there are any actual examples of truly good natural keys that were valid at one time and invalid at a later date but were also *not* accompanied by a substantial change elsewhere. SSN is not a good example; it was never intended as a ubiquitous identifier, even though many companies chose to use it as one. – Adam Robinson Mar 24 '11 at 02:43
  • @Adam Robinson: this is a very big request: which natural keys are good? Which changes are substantial? Anyway, recently we borrowed a book, wanted our own copy, and ordered in on Amazon. The book that arrived had the same ISBN, author, and title, but different contents. How does this situation rate against your criteria? – A-K Mar 24 '11 at 15:21
  • @Alex: I mean "good" an an objective, black-and-white sense: the key chosen truly is unique (at least as far as the potential data set goes) and not simply assumed to be (vis-a-vis SSN). By major changes, I mean other changes to the database along the lines of what would be required to restructure the table (and related tables) in the event that the primary key must change. As for ISBN (or any other potential candidate key), I can't speak to that because I'm not a bookseller nor am I an expert in library sciences. I would imagine that someone who is either of those things could tell you – Adam Robinson Mar 24 '11 at 15:39
  • 1
    (cont.) if a duplicate ISBN is a possibility, or perhaps if the seller or publisher of the book took the cheap route and just assigned an ISBN rather than purchasing one from the registration body that hands them out. If it's a possibility, then ISBN isn't a candidate key because duplicates can exist. If it's that the publisher took the cheap route, then ISBN isn't a candidate key because not all books have them. Either way, the failure of a particular candidate key doesn't mean that natural keys are inherently bad, just that research and due diligence is required to make sure they're legit. – Adam Robinson Mar 24 '11 at 15:40
  • @Alex: All that being said, I will definitely acknowledge that there are some cases where choosing a natural key can be very difficult for a variety of reasons and a surrogate key can be a better choice just for the sake of simplicity. – Adam Robinson Mar 24 '11 at 15:56
  • @Adam Robinson: Any amount of "research and due diligence" may not be good enough, experts can make mistakes, and then we are for very expensive refactoring. So surrogate keys are essentially insurance against such a change. Suppose we hire an expert who confidently says that ISBN is guaranteed to be unique... – A-K Mar 24 '11 at 17:45
  • 1
    @Alex: There are tradeoffs with everything. Natural keys provide value (in potential elimination of joins, potentially clearer queries, more performant databases, etc.), and surrogate keys provide insurance against the possibility that assumptions are wrong. It's up to the designer and the business to decide if the risks involved (such as using an externally provided key like ISBN) present significant enough risk to warrant dumping the advantages of a natural key for a surrogate key. – Adam Robinson Mar 24 '11 at 17:53
  • 1
    (cont.) If your contention is that natural keys are intrinsically bad and should never be used, I don't believe we're going to come to an agreement. If, rather, your contention is that natural keys have costs (in the form of risk) and benefits compared to surrogate keys, then we're just beating a dead horse here. – Adam Robinson Mar 24 '11 at 17:54
  • 1
    As to your specific example, it would seem that if you hired an expert that said that ISBN was a completely reliable and suitable key, then your problem is with the people who are screening your experts ;) – Adam Robinson Mar 24 '11 at 17:54
  • @Adam Robinson: I am with you on "natural keys have costs (in the form of risk) and benefits compared to surrogate keys". I an not with you on "your problem is with the people who are screening your experts". The reason is simple: "people who are screening your experts" can make mistakes too, we cannot assume them to be error-prone. Anyway, even the best experts in the world can be wrong. For example, two Nobel Laureates in economics have participated in leading a large hedge fund into a multi-billion disaster. Google up "long term capital management". – A-K Mar 24 '11 at 18:23
  • @Alex: That's getting pretty far afield of the original question. I think the smiley in my previous statement should make it pretty obvious that it wasn't a serious assertion. – Adam Robinson Mar 24 '11 at 18:36
  • I totaly agree with you and would mark as accepted answer if just I didn't totaly agree with you.. ;-) Life is change - make db's flexible. – Teson Mar 24 '11 at 19:48
8

The very simple answer is data integrity. If the data is to be useful and accurate then the keys are presumably required. Having an "autogenerated id" doesn't remove the requirement for other keys as well. The alternative is not to enforce uniqueness and accept that data will be duplicated and almost inevatibly contain anomalies and lead to errors as a result. Why would you want that?

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • 4
    You can enforce uniqueness without making it a primary key. If you have a natural key and a surrogate key, it would be bad design to not add a unique index on the natural key. – HLGEM Mar 23 '11 at 14:14
  • 4
    @HLGEM, A key is a key. There is no difference between a "primary" key and any other candidate key. Indexes have nothing much to do with it. Keys are enforced by *constraints* whereas an index is just a way to speed up data access. – nvogel Mar 23 '11 at 14:20
  • 1
    I'd appreciate it if the person who downvoted my answer would explain why. – nvogel Mar 23 '11 at 14:47
8

In short, the purpose of composite keys is to use the database to enforce one or more business rules. In other words: protect the integrity of your data.

Ex. You have a list of parts that you buy from suppliers. You could could create your supplier and parts table like such:

SUPPLIER
SupplierId
SupplierName

PART
PartId
PartName
SupplierId

Uh oh. The parts table allows for duplicate data. Since you used a surrogate key that was autogenerated, you're not enforcing the fact that a part from a supplier should only be entered once. Instead, you should create the PART table like such:

PART
SupplierId
SupplierPartId
PartName

In this example, your parts come from specific suppliers and you want to enforce the rule: "A single supplier can only supply a single part once" in the PARTS table. Hence, the composite key. Your composite key prevents accidental duplicate entry of a part.

You can always leave business rules out of your database and leave them to your application, but by keeping the rule in the database (via a composite key), you ensure that the business rule is enforced everywhere, especially if you should ever decide to allow multiple applications to access the data.

John
  • 3,332
  • 5
  • 33
  • 55
  • This is a great example but it ignores the ability to build constraints on fields that are not the primary key. Do you see a problem with keeping the original structure and defining a unique constraint on `PartId + SupplierId` ? – HackSlash Nov 05 '18 at 21:04
4

Just as functions encapsulate a set of instructions, or database views abstract base table connections, so to do surrogate keys abstract the meaning of the entity they are placed on.

If, for example, you have a table that holds vehicle data, applying a surrogate VehicleId abstracts what it means to be a vehicle from a data point of view. When you reference VehicleId = 1, you are most surely talking about a vehicle of some sort, but do we know if it is a 2008 Chevy Impala, or a 1991 Ford F-150? No. Can the underlying data of whatever Vehicle #1 is change at any time? Yes.

ses011
  • 1,216
  • 1
  • 9
  • 15
4

Short answer: Multi-column foreign keys naturally refer to multi column primary keys. There can still be an autogenerated id column that is part of the primary key.

Philosophical answer: Primary key is the identity of the row. If there there is a bit of information that is an intrinsic part of the identity of the row (such as which customer the article belongs to.. in a multi customer wiki) - The information should be part of the primary key.

An example: System for organizing LAN parties

The system supports several LAN parties with the same people and organizers attending thus:

CREATE TABLE users ( users_id serial PRIMARY KEY, ... );

And there are several parties:

CREATE TABLE parties ( parties_id serial PRIMARY KEY, ... );

But most of the other stuff needs to carry the information about which party it is linked to:

CREATE TABLE ticket_types (
    ticket_types_id serial,
    parties_id integer REFERENCES parties,
    name text,
    ....
    PRIMARY KEY(ticket_types_id, parties_id)
);

...this is because we want to refer to primary keys. Foreign key on table attendances points to table ticket_types.

CREATE TABLE attendances (
    attendances_id serial,
    parties_id integer REFERENCES parties,
    ticket_types_id integer,
    PRIMARY KEY (attendances_id, parties_id),
    FOREIGN KEY (ticket_types_id, parties_id) REFERENCES parties
);
jkj
  • 2,561
  • 1
  • 17
  • 24
  • I think we're all clear on what a composite key *is*, the OP is asking why people still use them. – Adam Robinson Mar 24 '11 at 17:59
  • 2
    @Adam: Why not? On data integrity point of view they are necessary in some cases. The example tries to demonstrate that the it's not about autogenerated vs. composite. Composite primary keys are needed anyway. – jkj Mar 28 '11 at 13:30
2

While I prefer surrogate keys, I use composite cases in a few cases. The composite key may consist entirely or partially of surrogate key fields.

  • Many to many join tables. These usually require a unique key on the key pair anyway. In some cases additional columns may be included in the key.
  • Weak child tables. Things like order lines do not stand on their own. In this case I use the parent (orders) tables primary key in the composite table.

When there are multiple weak tables related to an entity, it may be possible to eliminate a table from the join set when querying child data. In the case of grandchild tables, it is possible to join the grandparent to grandchild without involving the table in the middle.

BillThor
  • 7,306
  • 1
  • 26
  • 19