1

Suppose we have a table of jobs like so:

  • job (id, number);

Suppose we have a customer request to track job times and we create a table like so:

  • (A) job_timer (id, job_id, timestamp);

but we have a choice of how we want to tie it to job table, so we can also:

  • (B) job_timer (id, job_number, timestamp);

Suppose that job_number is UNIQUE.

I am conditioned to make foreign keys based on id, so A) job_id would be the way I have seen it done. But, customers are looking up jobs by job number, and it will save me and database some work if I do a lookup directly by B) job_number. But should I do so?

By more work when using job_id I mean, i.e. given job number I only need to use job_timer table. when given job_id I need to tie in both tables - more cognitive programming work, more database work.

Note, a similar question Foreign Key to non-primary key addresses UNIQUE-ness of non-primary key but I don't believe it addressed my issue.

The original table (job) is part of legacy codebase where both fields id and number are utilized throughout the code extensively, and in this sense I have a "split primary key" condition. Weeding that out will be prohibitive due to lack of full test coverage. Why the number field was created and also why it was made to be a varchar are good questions. I am sure there must've been a reason at some point.

I'm looking for something like "yes, go ahead it's totally fine, there is no best practice in this case", or "no, if you do this you might come across issues X, Y, Z in the future".

Community
  • 1
  • 1
Dennis
  • 7,907
  • 11
  • 65
  • 115
  • d'oh, right... multiple matching parent ids would be ambiguous... I should probably go catch up on sleep. – Marc B Feb 01 '16 at 19:43
  • 4
    If the `number` in the `job` table is unique, then you should make _that_ the primary key and get rid of the `id` column altogether (because it apparently doesn't add any value to the table). That implies that the job number also never changes. –  Feb 01 '16 at 19:44
  • @horse: good catch! `id` field is superfluous ... – Dennis Feb 01 '16 at 19:47
  • also wanted to add that in my case... the original table (`job`) is part of legacy codebase where both fields `id` and `number` are utilized throughout the code extensively, and in this sense I have a "split primary key" condition. Weeding that out will be prohibitive due to lack of full test coverage. Why the `number` field was created and also why it was made to be a varchar are good questions.. I am sure there must've been a reason at some point. – Dennis Feb 01 '16 at 20:05

1 Answers1

1

TL;DR Always declare a FOREIGN KEY constraint (chain) when values for a list of columns must appear as values for another list of PRIMARY KEY or UNIQUE NOT NULL columns. But choosing which CK (candidate key) to reference as a FK (foreign key) when there are multiple CKs is ultimately pragmatic. The criteria are essentially those for choosing a PK (primary key) since distinguishing a CK as PK is ultimately for preferred use in FKs. A typical list is familiarity, irreducibility, stability & simplicity. Here past use suggests that either CK is reasonable. Although considering number to be only for final output explains its varcharness and its uniqueness despite the presence & uniqueness of id. If you ever include both then be aware that it might be appropriate to declare FOREIGN KEY on the pair. (Requiring adding UNIQUE NOT NULL on the pair in job.)


A superkey is a set of columns that are unique not null. A CK is a superkey that contains no smaller superkey. A table can have any number of CKs. A PK is a distinguished CK.

We could say that a "foreign superkey" holds when the values for a subrow in a table are also values for some superkey subrow in a referenced table. If the superkey is a CK then the foreign superkey is a FK. We tell the DBMS about CKs and foreign superkeys so that it can prevent invalid database states.

An SQL UNIQUE NOT NULL actually declares a superkey. So SQL PRIMARY KEY actually declares a distinguished superkey. It is a PK if the superkey is a CK. An SQL FOREIGN KEY actually declares a foreign superkey. It is a FK if the referenced superkey is a CK.

Your table with "split PK" is just a table with two CKs. (That form a superkey because all supersets of CKs are superkeys.) As far as constraint declarations are concerned, primacy is irrelevant. You should just declare the constraints that hold so that the DBMS can enforce them.

Be aware that if you have a table with id and number as FKs then it is likely that pairs of values must appear in job. If so then declare the pair as a foreign superkey via FOREIGN KEY. This need to add foreign superkeys is a disadvantage of having surrogate keys when there are natural keys. On the other hand this arises whenever there are multiple CKs.

PS Any superset of a unique column set is unique. But SQL requires you to declare the target of a FK as UNIQUE NOT NULL even if it must already be unique by containing some smaller set declared unique. So when there is an id-number pair where the pair has to appear in job you should declare the compound FK and the compound superkey. PPS The point of all these declarations is integrity, not indexing for optimization. (Although that's important too.)

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • thanks. a question on declaring UNIQUE on the *pair*. doesn't index work in ways where if I do UNIQUE(id, number) and then I look for `number` only, the index will not be used? I am under impression that such an index is used only when looking up `id` only or both `id`, `number` at the same time. But when looking up only `number`, the index is unused. I don't see how declaring UNIQUE on pair is better than declaring UNIQUE on `id` and UNIQUE on `number` separately. Current code does a lot of lookups on either id or number only but not both. What would you suggest going forward? – Dennis Feb 02 '16 at 21:04
  • 1
    You are right that it's redundant for uniqueness and unnecessary for indexing. But SQL just requires every FK target column set to be declared UNIQUE NOT NULL. See the new last paragraph of my answer. See also my edited first sentences. – philipxy Feb 02 '16 at 21:47
  • 1
    I believe many SQL implementations automatically create indexes for primary keys, but not necessarily for other CK's, not even when they are being referenced by some FK. So there might be CK's referenced by FK for which no index is available, and that's going to hurt performance at some point. (The key here is that the "downside" is due to the absence of the index, not to the fact that the referenced key is "secondary".) – Erwin Smout Feb 04 '16 at 12:17