How do primary keys work in junction tables for a DBMS? How can a composite key be a primary key?

Question

In a DBMS we have

Superkey - An attribute or a set of attributes that uniquely identifies a row in a table.
Candidate Key - An attribute or set of attributes that uniquely identify identifies a row in a table. The difference between the superkey and a candidate key is no subset of a candidate key can itself be a candidate key.
Primary key - A chosen candidate key that become the attribute to uniquely identify a row.

If we want to identify a many to many relation between two tables we can define a junction table such as:

Tables:

Author(AuthorID, FirstName, LastName) -- AuthorID is primary key
Book(BookID, BookTitle) -- BookID is primary key

To create the relation between both:

AuthorBook(authorID, BookID) -- together authorID and BookID are primary key

I am thinking bookID and authorID are both primary keys in their own respect.

Since a candidate key (and therefore a primary key) must not have a subset containing a candidate key, how can authorID plus BookID be a primary key? This seems to break the definition of a primary key.

I understand this may be the difference between real world an theory but as the DBMS textbooks I have read seem to define junction tables this way and define primary keys this way it seems like there is a disconnect there.

Am I misunderstanding this concept?

Yes you are misunderstanding. A table might have several Candidate keys -- that's why they're "candidates". (Each of which might consist of several attributes, as you say.) A Primary key is one of the candidates, chosen as Primary typically for some pragmatic/ergonomic reason. Your junction table is all key (as is usual for junction tables), so there's only one Candidate, so that must be the Primary. — AntC, Apr 04 '20 at 21:25
Please see how I replaced "both" with "together" & "and" with "plus" to be clear & consistent with "composite PK". Using "both" & "and" does not make it clear whether you are talking about 2 attributes that are each a PK vs 2 attributes that form one PK. — philipxy, Apr 04 '20 at 21:45
What is your reference? Follow a (good) published academic textbook on information modelling, the relational model & DB design & querying. (Manuals for languages & tools to record & use designs are not such textbooks.) (Nor are wiki articles or web posts.) Dozens of published academic information modelling & DB design textbooks are online free in pdf. — philipxy, Apr 04 '20 at 22:03

philipxy · Accepted Answer · 2020-04-12T04:46:19.687

When we use one of those terms we have to be talking about a given table (variable, value or expression). The superkeys, CKs & PKs of a table are not determined by roles its attributes play in other tables. They are determined by what valid values can arise for the table under the given business rules.

Superkey - An attribute or a set of attributes that uniquely identifies a row in a database.

A superkey of a given table can be defined as a set of attributes that "uniquely identifies a row" of the table. (Not database.) Although that quoted phrase is a kind of shorthand that isn't a very clear description if you don't already know what it means.

A superkey of a given table can be defined as a set of attributes whose subrow values can only appear once in the table. Or as a set of attributes that functionally determines every set of attributes in the table.

When a superkey has just one attribute we can sloppily talk about that attribute being a superkey.

Candidate Key - An attribute or set of attributes that uniquely identifies a row in a database.

It's true that every CK (candidate key) of a certain table is a superkey of that table. But you mean that a set of attributes is by definition a superkey when/iff that and some other condition(s) hold. But you don't clearly say that when you write this section.

The difference between the superkey and a candidate key is no subset of a candidate key can itself be a candidate key.

No. A set is a subset of itself so a CK is a subset of itself so a CK always has a subset that is a CK--itself. What you mean is, no proper/smaller subset. Then your statement is true. But also true and more important is that no proper/smaller subset of a CK is a superkey.

You don't actually define "CK" in this paragraph. A CK of a given table can be defined as a superkey of that table that contains no proper/smaller subset that is a superkey of that table.

Primary key - A chosen candidate key that becomes the attribute to uniquely identify a row.

No. The PK (primary key) of a given table is defined as the one CK of that table that you decided to call the PK. (Not attribute.)

Note that CKs & PKs are superkeys. PKs don't matter to relational theory.

To create the relation between both:
AuthorBook(authorID, BookID) -- together authorID and BookID are primary key

What the superkeys & CKs are & so what the PK can be is determined by the FDs (functional dependencies) that hold in the table. But if you are presuming that this is a many to many table then it takes an authorID-bookID pair to uniquely identify a row, so there can only be one CK, {authorID, bookID}. So that is the only possible PK. So {authorID} & {bookID} cannot be superkeys or CKs or PKs.

You can see this by looking at examples & applying the definitions.

authorID bookID
      1      a
      1      b

Here authorID does not uniquely identify a row. So it can't be a superkey. So it can't be a CK. So it can't be a PK.

textbooks I have read seem to define junction tables this way and define primary keys this way

No, they don't.

However they do say that certain sets of attributes & subsets of superkeys, CKs & PKs in the junction table are FKs (foreign keys) in the junction table referencing those other tables where they are CKs (which might be PKs) of/in those other tables.

A FK of a given table can be defined as a certain set of attributes in the table whose subrow values must appear as certain CK subrows in a certain other table.

But since you say this is a junction table, presumably {authorID} is a FK to an author table where its values appear under a CK/PK & {bookID} is a FK to a book table where its values appear under a CK/PK. So FK {authorID} in AuthorBook referencing {authorID} in Author & FK {bookID} in AuthorBook referencing {bookID} in Book.

PS PK & other terms mean something else in SQL. A declared SQL PK can have a smaller SQL UNIQUE declared within it. SQL "uniqueness" itself is defined in terms of SQL NULL. It's reasonable to say that an SQL PK is more reminiscent a relational superkey than it is reminiscent of a relational PK. Similarly a SQL FK is more reminiscent of what we could reasonably call a relational foreign superkey than a relational foreign key.

Thank you for your answer. I think what your are saying is because in my pretend junction table authorID nor bookID can uniquely identify a row. Since it is many to many and one authorID can have several bookIDs just as one bookID can have several authorIDs. So only combined do they make unique role. (I did mean table and not database above) — EFiore, Apr 04 '20 at 23:22
Yes I am saying that. When you asked your question you seemed to think that each of {authorID} & {bookID} was a junction table CK--"Since a candidate key (and therefor a primary key) must not have a subset containing a candidate key". But your question doesn't say why you thought that--it doesn't give all your reasoning clearly. Did you think each one uniquely identified junction table rows? (Why?) Or did you think that being a CK/PK in (respectively) Author & in Book implied that they were CKs/PKs in other tables? (Why?) — philipxy, Apr 04 '20 at 23:41
I was thinking if bookID and authorID are both primary keys in their own respect how can they, combined, be a primary key, if one of the requirements of a primary key is that any primary cannot have any, smaller, subset that is itself a key. But since this is a new table we cannot think of authorID and bookID as keys onto themselves. At least that is what I believe to be the case. — EFiore, Apr 05 '20 at 00:30
I edited the new stuff from your comment into your post. (Please clarify via edits not comments.) I hope that it is clear from my answer that the place you went wrong is that when we talk about a key of a table it must be with respect to a given table.--"What the superkeys, CKs & PK of a table are is not determined by roles its attributes play in other tables." — philipxy, Apr 05 '20 at 01:23

How do primary keys work in junction tables for a DBMS? How can a composite key be a primary key?

1 Answers1