81

Although one can use composite primary keys, for the case below, is it really a bad practice? The consensus on Stackoveflow seems to go both ways on this issue.

Why?


I want to store payments for the orders in a separate table. The reason is that, an order can have many items which are handled in a separate table in the form of many to many relationship. Now, if I don't use composite primary keys for my payment table, I'll lose my unique PaymentID:

[PaymentId] INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
[OrderId] INT NOT NULL PRIMARY KEY --Also a Foreign Key--

Now, if I just remove the Primary Key for the OrderId, I'll lose my one to one relationship here so Many OrderIds can be associated to many PaymentIds, and I don't want this.

This seems to be why other answers on SO have concluded (mostly) that the composite key is a bad idea. If it is bad, what's the best practice then?

ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122
JAX
  • 1,540
  • 3
  • 15
  • 32
  • If I understand you correctly, in this case you can just add a separate unique constraint on `OrderId`, and have `PaymentId` as the primary key. – Laurence Sep 27 '14 at 19:53
  • I didn't understand the part: "The reason is that, an order can have many items which are handled also in a separate table in the form of many to many relationship."? if you have `order_id` in `payments` table then all you have to do is to reference it with `orders` table, How would you lose unique `PaymentID`? – Surya Sep 27 '14 at 19:55
  • @Laurence: Yeah but 1 Order in that case can have multiple payments which is bad, am I right ? – JAX Sep 27 '14 at 19:55
  • 1
    In my opinion, having multiple payments for an order isn't bad at all. – Surya Sep 27 '14 at 19:56
  • @Surya: Please look my previous comment – JAX Sep 27 '14 at 19:56
  • @Surya: In real world it is bad because it can't happen – JAX Sep 27 '14 at 19:56
  • What is the composite primary key you are talking about? (PaymentId,OrderId)? With FKs (PaymentId) and (OrderId)? It would be helpful if you gave proper SQL for your choices. Your question is unclear. – philipxy Sep 28 '14 at 02:58
  • A god answer here https://stackoverflow.com/a/2204535/3360759 – frhack Aug 15 '22 at 21:27
  • In this discussion, can we assume that the `PRIMARY KEY` is necessarily `UNIQUE` _and_ clustered with the data? That is the only option for MySQL's InnoDB; other vendors may work differently -- _It makes a difference in performance!_ – Rick James Dec 14 '22 at 00:58
  • 1/3 of the tables I have written use an auto_inc id for the PK. For the rest I use whatever is 'natural' -- sometimes that is composite. For many-to-many, it is demonstrably faster to use a composite key and no "id". – Rick James Dec 14 '22 at 01:06

6 Answers6

88

There is no conclusion that composite primary keys are bad.

The best practice is to have some column or columns that uniquely identify a row. But in some tables a single column is not enough by itself to uniquely identify a row.

SQL (and the relational model) allows a composite primary key. It is a good practice is some cases. Or, another way of looking at it is that it's not a bad practice in all cases.

Some people have the opinion that every table should have an integer column that automatically generates unique values, and that should serve as the primary key. Some people also claim that this primary key column should always be called id. But those are conventions, not necessarily best practices. Conventions have some benefit, because it simplifies certain decisions. But conventions are also restrictive.

You may have an order with multiple payments because some people purchase on layaway, or else they have multiple sources of payment (two credit cards, for instance), or two different people want to pay for a share of the order (I frequently go to a restaurant with a friend, and we each pay for our own meal, so the staff process half of the order on each of our credit cards).

I would design the system you describe as follows:

Products  : product_id (PK)

Orders    : order_id (PK)

LineItems : product_id is (FK) to Products
            order_id is (FK) to Orders
            (product_id, order_id) is (PK)

Payments  : order_id (FK)
            payment_id - ordinal for each order_id
            (order_id, payment_id) is (PK)

This is also related to the concept of identifying relationship. If it's definitional that a payment exists only because an order exist, then make the order part of the primary key.

Note the LineItems table also lacks its own auto-increment, single-column primary key. A many-to-many table is a classic example of a good use of a composite primary key.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    "it's not a bad practice in all cases" . . . I can agree with that. – Gordon Linoff Sep 27 '14 at 19:59
  • What does a many-to-many relationship between orders and payments have to do with this question? – wvdz Sep 27 '14 at 20:02
  • @popovitsj: Nothing to do with this part of the question, I explained why I want things to be split up in different tables – JAX Sep 27 '14 at 20:06
  • 8
    Some of the opinion against declaring composite primary keys appears to be driven by the way some ORM tools work. Pierre may or may not be in this situation. – Walter Mitty Sep 28 '14 at 06:16
  • 9
    @WalterMitty, right, ORM frameworks like Ruby on Rails started out with the phrase "opinionated software" about PK design being `id` only, but this is like saying that you won't support functions with more than one argument. In versions after the first, RoR supports compound primary keys. All frameworks eventually come to the same conclusion. If anyone is still using an ORM that doesn't support compound PK's, you need to upgrade. – Bill Karwin Sep 28 '14 at 06:56
  • 14
    It's also worth pointing out that autoincrement guarantees the uniqueness of table rows, but not necessarily the singular identity of each of the subject matter entities. An operational error can result in duplicate entry of the same person, course, product, etc. – Walter Mitty Sep 28 '14 at 12:55
  • @WalterMitty, right but compound PK doesn't ensure that either. That's not usually a problem in the relational model, because it's implicit in the model that the value 1234 in one column doesn't mean the same thing as the value 1234 in a totally different column. – Bill Karwin Sep 28 '14 at 17:06
  • Would it not be better to invert the order of the primary key in LineItems, so that inserts into the index are sorted by order_id, product_id, and not the other way around? Unique-wise it's the same – nickdnk Jun 17 '17 at 18:37
  • @nickdnk, depends on how you plan to query the table. If you want to favor lookup by product most of the time, then it would be a good idea to define the primary key the way I show. If you want lookup by order to be more efficient, then change the column order as you suggest. – Bill Karwin Jun 17 '17 at 18:43
  • @BillKarwin - I mean for inserts, the performance should be better with the order id first, since you would not need to shift around rows randomly based on the product number used. It would already be in the order inserted, naturally. Adding another index (non-unique) to the product_id should solve the query problem, no? – nickdnk Jun 17 '17 at 18:45
  • @nickdnk, you're assuming the inserts are committed in perfect sequential order by order_id. If the inserts are coming in at a slow rate, this might be true. But if it's that slow, inserting in non-sequential order won't be a bottleneck. If the orders are coming in so fast that you have to optimize insert order, they're probably coming in from many concurrent threads, and they won't be perfectly sequential. – Bill Karwin Jun 17 '17 at 18:59
  • @BillKarwin - Okay. In a heavy insert situation I was considering that following the order_id insert incrementally would result in a lot more buffer pool cache hits, even though they weren't exactly in order. But I see your point. – nickdnk Jun 17 '17 at 19:01
  • 2
    @nickdnk Anyway, you're right that inserting in sequential order can be a benefit. See this blog for some explanation and clever graphical proof: https://www.percona.com/blog/2015/04/03/illustrating-primary-key-models-in-innodb-and-their-impact-on-disk-usage/ – Bill Karwin Jun 17 '17 at 19:09
  • some naming conventions here are really useful... calling a key column 'order_id_pk' or 'order_id_fk' is super useful. That way when you're writing queries or reading old code, you know when you're dealing with primary keys or foreign keys. – Scuba Steve Sep 27 '18 at 20:41
  • I believe it's harder to make payment_id ordinal for each order_id, rather than to simply make payment_id ordinal (serial). What's the point in making it more complex? @BillKarwin RoR supports composite primary keys? To which extent? It autoadds the `id` column to join tables, and to avoid that... I doubt that's practical. The price is probably too high, and the upsides are unclear. @ScubeSteve Calling primary keys `id` and foreign keys `something_id` achieves the same goal in a more concise/readable way ;) – x-yuri Jun 07 '21 at 01:02
  • Why not simply define Payments as : payment_id int autoincrement PK -- order_id (FK) -- payment_ord - ordinal for each order_id -- unique index (order_id, payment_ord ) – frhack Aug 15 '22 at 18:52
  • @frhack, In your suggestion, the payment_id is redundant. – Bill Karwin Aug 15 '22 at 18:56
  • @BillKarwin yes and no. Suppose you have another entity lets say PaymentLog when you store the many steps of a payment transaction, immagine an huge table with many records for each payment. In this table we need a FK to Payment. If the Payment PK is composite we have one (or more) redundant key, order_id in this case here is redundant. – frhack Aug 15 '22 at 21:08
  • @frhack, If that makes more sense to you, go ahead and implement it that way. – Bill Karwin Aug 15 '22 at 21:13
  • @BillKarwin we are investigating the tradeoffs of the two options, not what makes more sense to me or to you. Or not ? – frhack Aug 15 '22 at 21:16
  • @frhack, Well, one tradeoff is that the design you describe has two unique indexes to update on every INSERT/UPDATE/DELETE, whereas mine has one. – Bill Karwin Aug 15 '22 at 21:43
  • @wvdz - A many-to-many relationship is best handled by two 2-column indexes, one of them being `PRIMARY KEY`. Adding an autoincrement column is a performance drag with no gain. – Rick James Dec 14 '22 at 00:45
  • @WalterMitty - autoinc (at least in MySQL) does _not_ guarantee uniqueness; `PRIMARY KEY` does. They are usually used together, but they do not have to. – Rick James Dec 14 '22 at 00:47
  • @nickdnk - If you usually fetch "the orders for one product" then the order `product_id, order_id` is 'better'. And vice versa. If you fetch both ways (as in many-to-many), then have both indexes. The optimization of `INSERTs` is low priority. Anyway, if you have both indexes, then both BTrees (one for data, one for index) need updating! – Rick James Dec 14 '22 at 00:50
37

This question is dangerously close to asking for opinions, which can generate religious wars. As someone who is highly biased toward having auto-increasing integer primary keys in my tables (called something like TablenameId, not Id), there is one situation where it is optional.

I think the other answers address why you want primary keys.

One very important reason is for reference purposes. In a relational database, any entity could -- in theory -- be referenced by another entity via foreign key relationships. For foreign keys, you definitely want one column to uniquely define a row. Otherwise, you have to deal with multiple columns in different tables that align with each other. This is possible, but cumbersome.

The table you are referring to is not an "entity" table it is a "junction" table. It is a relational database construct for handling many-to-many relationships. Because it doesn't really represent an entity, it should not have foreign key relationships. Hence, a composite primary key is reasonable. There are some situations, such as when you are concerned about database size, where leaving out an artificial primary key is even desirable.

cezar
  • 11,616
  • 6
  • 48
  • 84
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • If you could answer @philipxy's comment, that would be helpful. As I'm currently in a design phase contemplating the trade-offs. – Anish Ramaswamy Mar 22 '16 at 23:02
  • 1
    @AnishRamaswamy I think he means that if you want to link two tables together, you link them by the unique identifier. The primary key of the other table becomes the foreign key in your table. And he is saying that his preference is if that primary key is not a composite key, because he might not want to have multiple columns imported into his table, rather, he might only want one. – barlop Jan 20 '18 at 02:47
  • 1
    I think it would be helpful to comment on this old answer to add an important comment: `TableNameId` is annoyingly redundant. – Alexander Oct 22 '19 at 12:41
  • Size issue... If you are talking about InnoDB, then the breakeven point is PK plus one secondary index. Only with multiple secondary indexes does the size of the PK impact disk space. – Rick James Dec 14 '22 at 01:01
13

Best practices are helpful at best, but blinding at worst. Going against a best practice isn't a sin. Just be sure you know what kind of trade-off you are making.

Database engines can be very complicated things. Without knowing what particular optimizations are made by a given engine, it will be difficult to determine what kinds of constructs will yield the best performance (because I assume that the issue we are talking about here is performance). Composite keys may be problematic for large tables in one kind of database, but not have any noticeable impact for another.

A useful practice I've learned is to always strive for having my applications as simple as possible. Do using composite keys save you from having to perform lookups before insertions, or some other nuisance? Use them. If you, however, notice that using them makes your application no longer satisfy some significant performance requirement, consider a solution without them.

Emanuel
  • 831
  • 6
  • 14
13

Disk space is cheap, so a primary key clustered on an int identity(1,1) named after a convention (like pk + table name) is a good practice. It will make queries, joins, indexes and other constraints easy to manage.

However there's one good reason to no do that (in MS SQL Server at least): if you want to manage the physical sorting of your data in the underlying storage system.

The primary key clustered determines the physical sorting order. If you do it on an identity column, the physical sorting order is basically the insert order. However, this may not be the best, especially if you always query the table the same way. On very large tables, getting the right physical sorting order makes queries a lot faster. For example you may want the clustered index on a composite of two columns.

JeromeE
  • 449
  • 4
  • 6
  • 1
    Do not include the table name in column names. When doing a single-table query, the table name is already in the query; when doing a JOIN, qualify all columns with the table name or a short alias. – Rick James Dec 14 '22 at 01:03
1

From internal database engine point of view there is no obstacle to using complex primary keys. Indexing will work fine, integrity constraints will work fine, there's no performance drawback etc.

In theory they are fine, but in practice there are some things (possible issues) that one should take into account:

  1. Foreign keys (FK) in related table must have multiple columns as PK. If one had simple PK, relational consistency would be easier to maintain.

  2. Many times in (web) application customer needs to select/pick up some data (retrieved from DB table). Behind each record is always PK which leads to some action. If these records are represented as web links - that is easier to maintain as links with multiple URL parameters. But if data should be selected from drop-down list, then it could be challenging to implement transfer of more parameters for PK since HTML form drop-down lists generally use only one key parameter.

Generally when you deal with complex PK (multiple columns) your code should be multiplied for this columns (getters/setters), parameter transfer in application, intrusion checks etc. Finally during application development you come to conclusion that maybe it is easier to have simple PK although during DB modeling that was not an issue.

sbrbot
  • 6,169
  • 6
  • 43
  • 74
-1

If your table with a composite primary key is expected to have millions of rows, the index controlling the composite key can grow up to a point where CRUD operation performance is very degraded. In that case, it is a lot better to use a simple integer ID primary key whose index will be compact enough and establish the necessary DBE constraints to maintain uniqueness.

Source:

https://www.toptal.com/database/database-design-bad-practices

Alfonso Tienda
  • 3,442
  • 1
  • 19
  • 34
  • If that PK is "clustered" with the data, it will not grow significantly. Anyway, if you need that set of columns to be indexed for fast access, then that secondary index will grow significantly! – Rick James Dec 14 '22 at 01:13
  • That link was written by someone with expertise in Oracle and SQL Server. If you are not using either of those, be cautious about applying everything, in particular, "composite PKs". – Rick James Dec 14 '22 at 01:26