0

Is there any benefit to using a table schema like this:

CREATE TABLE review (
    review_id SERIAL PRIMARY KEY,
    account_id INT REFERENCES account(account_id) NOT NULL, 
    product_id INT REFERENCES product(product_id) NOT NULL, 
    rating SMALLINT NOT NULL, 
    comment TEXT,
    UNIQUE (account_id, product_id)
);

Or should the constraint itself be the primary key, like this:

CREATE TABLE review (
    CONSTRAINT review_pkey (account_id, product_id) PRIMARY KEY,
    account_id INT REFERENCES account(account_id) NOT NULL, 
    product_id INT REFERENCES product(product_id) NOT NULL, 
    rating SMALLINT NOT NULL, 
    comment TEXT,
);
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Matt
  • 463
  • 4
  • 14
  • [Is a Unique ID column needed in a many-to-many (junction) table?](https://softwareengineering.stackexchange.com/questions/182103/is-a-unique-id-column-needed-in-a-many-to-many-junction-table) – Lukasz Szozda Nov 12 '19 at 21:09
  • Old school vs. new school. – jarlh Nov 12 '19 at 21:20
  • @MatthewLerner . . . I prefer the first method, because I like having a single column primary key. That said, each has its advantages and disadvantages, so it is not easy to say which is better. – Gordon Linoff Nov 12 '19 at 21:35
  • @GordonLinoff I like the first method too, but I was wondering if there was anything disadvantageous about it aside from the wasted disk space and anything advantageous aside from the feeling I have. Just want to see what I am missing from more experienced people with DB design. – Matt Nov 12 '19 at 22:05

2 Answers2

2

The second version is clearly preferable, because it requires one less column and one less index, and there is no down side.

The column is obvious, the indexes aren't, because you forgot to add them: You need indexes on all the foreign key columns so that deletes on the referenced tables can be fast. With the artificial primary key, you need indexes on review_id, account_id and product_id, while without you can do with the indexes on (account_id, product_id) and product_id.

The only people who will advocate the first solution are people who hold a religious belief that every table has to have an artificially generated numerical primary key, no matter what. In reality, the combination of the two artificially generated keys from the referenced tables is just as good.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • In general, I can partially agree with this statement. However, the `id primary key` on all tables may have quite a practical aspect. For example, this makes possible to easily automate the generation of forms in apps. It has nothing to do with religion. – klin Nov 12 '19 at 22:48
  • True, if you choose tools that limit you, you will be limited. – Laurenz Albe Nov 12 '19 at 22:53
  • 1
    When you create the tools, simplicity may be more important than one column surplus. It is not a limit, rather a practical choice. – klin Nov 12 '19 at 22:59
  • 1
    Sorry if this seems like its off-topic, but is there any reason why you would only place indexes on `(account_id, product_id)` and `product_id` and not `account_id` in addition? – Matt Nov 12 '19 at 23:36
  • 2
    @MatthewLerner: There is a very good reason for that indexing strategy. See: https://dba.stackexchange.com/q/27481/3684 – Erwin Brandstetter Nov 13 '19 at 00:15
1

Besides religion, habits, personal preferences and convenience with certain client tools, there are other good reasons for an additional surrogate PK as demonstrated in your first example.

If you are going to reference that table with foreign keys from other tables:

  • Referencing table(s) only need to include the single surrogate PK columns as FK reference, which is smaller, faster and simpler. If referencing table(s) have many rows and review does not, a single instance may already outweigh the additional cost to review. Else, multiple instances may.
    For small lookup tables that are referenced in many rows, even consider a smallserial surrogate PK - if that actually helps. See:

  • Typically, there will be an index on FK columns of referencing tables, too. Your example with two integer is most favorable for the multicolumn PK / FK as it keeps index size to a minimum. A B-tree index on two integer columns is no bigger than one on a single integer (8 bytes typically is the minimum "payload" for index tuples). Other, bigger data types would make additional difference.

  • If review receives many updates to one of the columns (account_id, product_id), those will cascade to all referencing tables based on those two columns. Multiplies write costs, bloats multiple tables and indexes. If it cascades to wide rows or many referencing rows, costs may increase substantially. All of this may be avoided with a surrogate PK - if the relational design is actually supposed to work that way.

If review is involved in many queries with joins, joining on two columns instead of just one is more tedious to write and slightly more expensive. Again, more so for bigger data types.

That said, if you have none of the above (or similar), look to Laurenz' answer.

Weigh actual costs, not religious beliefs.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Ok, I should have omitted religion. I buy that the two columns you need if you reference the compound primary key waste some space, but I cannot see how it would slow down processing. Can you explain? Also, I don't think that foreign keys referencing a mapping table are very common. If you update primary keys, you are doing something terribly wrong, so I don't buy that argument. My main concern is not the additional column, but the additional index. – Laurenz Albe Nov 13 '19 at 07:12
  • Doesn't have to be a "mapping table" and not claiming it's very common. `If you update primary keys, you are doing something terribly wrong.` Mostly so. So don't *make* the two columns PK in that case. Isn't that my point? The additional index *is* a concern, no argument about that. – Erwin Brandstetter Nov 13 '19 at 14:39
  • I agree. I misunderstood what you said about updating one of the IDs: I thought you meant an update that cascades from a primary key change in one of the referenced tables, but that needn't be so - it could be that (still assuming it is a mapping table) someone updates a mapping to change the object something is associated with. Probably also not a frequent occurrence. – Laurenz Albe Nov 13 '19 at 16:16