How to set up Primary Keys in a Relation?

Question

I wish to know how to correctly set up Primary Keys in a Relation. E.g. we have ER-diagram which contain elements:

Key attributes
Weak key attributes
Identifying relationships
Associative entities

In order to translate it into Relational Model we should do some tricks. All elements above deal with Primary Keys of relations but they all are Natural Keys - so we can leave them as is or replace with Surrogate Keys.
Consider some cases.

Case 1

Key Attribute is a name - so it must be of type CHAR or VARCHAR. Generally names become Key Attributes.

Case 2

Two (or more) Identifying Relationships become a Composite Primary Key of a relation (which is made of Foreign Keys).

Case 3

Identifying Relationship(s) with Weak Key Attribute(s) also become a Composite Primary Key.

Case 4

Associative entities usually have two or more Identifying Relationships so they are to be Junction Relations (Junction Tables).

How to set up primary keys for Relations in order to handle all above cases (perhaps some more cases which I did not mention)?
How to avoid using surrogate keys and in which cases are they necessary?
How to set up datatypes for primary keys?
If a composite primary key has to be passed into child relation, shall it be replaced with a surrogate?

Advantages and disadvantages of using surrogate keys in my view:

Advantages

They're compact (usually of type INT) and are sometimes good replacement for Composite Keys
They're illustrative when they're in Foreign Keys
They're painlessly indexed

Disadvantages

They're numbers and meaningless. E.g. I wish to fill up Junction Table in my Interface Application - so I will be left no other choice but to relate just numbers
They're redundant
They're confusing

As for setting up datatypes - there must be more tricks as well as setting up primary keys as whole.

Update

I should have given an example initially, but I did not. So here's an example. Consider we have two main entities which interact with each other (still don't know how to illustrate such things as diagrams here - so I'll show them as tables which are to demonstrate International Space Station crew rotation system):

SpaceShip

╔════════════════╤════════════════╗
║ ShipName       │ ShipType       ║ ShipName - Primary Key
╟────────────────┼────────────────╢ ShipType - Foreign Key (but it is
║ Soyuz TMA-14   │ Soyuz          ║   not being considered here)
║ Endeavour      │ Space Shuttle  ║
║ Soyuz TMA-15M  │ Soyuz          ║
║ Atlantis       │ Space Shuttle  ║
║ Soyuz TM-31    │ Soyuz          ║
║ ...            │ ...            ║
╚════════════════╧════════════════╝

And the Crew

╔════════╤══════════╗
║ CrewId │ SallSign ║ CrewId - Primary Key (used Id 'case crew is usually
╟────────┼──────────╢   shown as crew members - it has no particular
║ 4243   │ Astreus  ║   name)
║ 4344   │ Altair   ║ CallSign - attribute (it may not be assigned or
║ 4445   │ ...      ║   explicitly shown - i.e. it can be NULL)
║ ...    │ ...      ║
╚════════╧══════════╝

These two entities interact via Flight. Each flight delivers to the ISS one crew and returns another or the same crew. Obviously relationship between the Flight and Crew is many-to-many and it needs junction relation (table). But we can not just relate the SpaceShip and the Crew because of spaceships - spaceship can be reusable (returnable) such as Space Shuttles were.

So the Flight should look like:

╔═══════════════╤════════════╤═══════════════╤═════╗
║ ShipName      │ FlightName │ ShipFlightNum │ ... ║ ShipName, FlightName
╟───────────────┼────────────┼───────────────┼─────╢   are composite PK
║ Soyuz TM-31   │ NULL       │ 1             │ ... ║ ShipFlightNum
║ Atlantis      │ STS-117    │ 28            │ ... ║   depends on whole
║ Soyuz TMA-14  │ NULL       │ 1             │ ... ║   Composite PK
║ Endeavour     │ STS-126    │ 22            │ ... ║ ... - other
║ Soyuz TMA-15M │ NULL       │ 1             │ ... ║   attributes which
║ Endeavour     │ STS-111    │ 18            │ ... ║   depend on PK
║ Atlantis      │ STS-122    │ 29            │ ... ║
║ ...           │ ...        │ ...           │ ... ║
╚═══════════════╧════════════╧═══════════════╧═════╝

So Flight has Composite Primary Key (flight name for Soyuz vehicle the same as the spacecraft's name but it differs for reusable spacecrafts such as Space Shuttle) and it needs to be related with Crew as many-to-many. Here is the part of my complex question - if this composite Primary Natural Key should be replaced with Surrogate one?
And if we're going to work with Natural Keys further then new Junction Relation (Associative Entity) should look like:

Designation (Crew is Designed to the Flight)

╔═══════════════╤════════════╤════════╤══════════╗
║ ShipName      │ FlightName │ CrewId │ CrewType ║
╟───────────────┼────────────┼────────┼──────────╢
║ Soyuz TMA-15M │ NULL       │ 4243   │ Deliver  ║
║ Soyuz TMA-15M │ NULL       │ 4243   │ Return   ║
║ Soyuz TMA-15M │ NULL       │ 4445   │ Backup   ║
║ Soyuz TMA-16M │ NULL       │ 4344   │ Deliver  ║
║ Soyuz TMA-17M │ NULL       │ 4445   │ Deliver  ║
║ Soyuz TMA-18M │ NULL       │ 4344   │ Return   ║
║ Endeavour     │ STS-111    │ 55     │ Deliver  ║
║ Endeavour     │ STS-111    │ 44     │ Return   ║
║ Endeavour     │ STS-113    │ 55     │ Return   ║
║ ...           │ ...        │ ...    │ ...      ║
╚═══════════════╧════════════╧════════╧══════════╝

Here we have 4x Composite Primary Key which is made up of four Foreign Keys (CrewType also have FK constraint). If we use Surrogates instead of Naturals then result will be more compact but hard to fill up (in my view).

One more update

Another case for table (relation) TypeCrew:

╔══════════╗
║ CrewType ║
╟──────────╢
║ Deliver  ║
║ Return   ║
║ Backup   ║
║ ...      ║
╚══════════╝

Everyhing would be fine if only we had not to use these values in our queries (WHERE CrewType LIKE 'Backup'). If these values will be replaced with alternative meanings in other languages or even with symbols e.g. >, < and ^ for Deliver, Return and Backup respectively (WHERE CrewType LIKE '^'). Adding numerical Surrogate Key will not help much as its values may mismatch with TypeName (WHERE TypeId=2):

╔════════╤══════════╗    ╔════════╤══════════╗    ╔════════╤══════════╗
║ TypeId │ TypeName ║    ║ TypeId │ TypeName ║    ║ TypeId │ TypeName ║
╟────────┼──────────╢    ╟────────┼──────────╢    ╟────────┼──────────╢
║ 0      │ Deliver  ║    ║ 0      │ Backup   ║    ║ 0      │ >        ║
║ 1      │ Return   ║    ║ 1      │ Deliver  ║    ║ 1      │ <        ║
║ 2      │ Backup   ║    ║ 2      │ Return   ║    ║ 2      │ ^        ║
║ ...    │ ...      ║    ║ ...    │ ...      ║    ║ ...    │ ...      ║
╚════════╧══════════╝    ╚════════╧══════════╝    ╚════════╧══════════╝

Perhaps this is not a question of Relational Model? Perhaps it's just bad design? But I could not devise better.

You didn't say this in your question, and perhaps you intended not to say it, but the difference between a candidate key and a primary key strikes me as part of the answer to your broader question. — Walter Mitty, Jun 04 '15 at 10:36
I'm not sure what you didn't understand. Is the term "candidate key" unfamiliar to you? — Walter Mitty, Jun 04 '15 at 19:55
Indeed, I know what you're talking about. But if we initially had an entity with the only one identificator which was called _Key_ (one of the attributes of the entity which makes it unique among all other entities) then why shall we add some other attribute (which is unique as well) to the entity and shift original _Key_ to alternative? Alternatives, Primaries, Surrogates (Candidates) - how to choose when one of them shall be the only one? And when it is necessary? — Umbra Aeternitatis, Jun 04 '15 at 21:10
When we proceed from conceptual model (as I understood) to logical, we must define a set of _Relations_ (which is based on the set of entity types of conceptual model). After that set _Primary Keys_ acording to the features of _Relational Model_. And do it right. — Umbra Aeternitatis, Jun 04 '15 at 21:17
@WalterMitty. "candidate key" is not a Relational term. It is used by non-relational theoreticians to avoid the Relational demand of Primary Key or Alternate Key, while using a surrogate (an ID, a non-key) as "primary key". Good for non-relational Filing systems. — PerformanceDBA, Jun 05 '15 at 19:11
@UmbraAeternitatis. (a) Can you please give an example, an application with some data. Otherwise the answer to your question would be a discourse. (b) We don't use *tricks*, we use database science, plus the science given in the *Relational Model* if we want a relational database. — PerformanceDBA, Jun 05 '15 at 19:13
I hope it isn't very confusing when I compare _Entities_ and _Relations_ and _Tables_ - I think of them as: _Entity_ is an object, _Relation_ is a set of _Entities_ (_Entity Type_) and _Table_ is some subset all possible _Entities_ of a _Relation_ (perhaps somewhere I mixed them - it needs to be reviewed). — Umbra Aeternitatis, Jun 05 '15 at 21:38
@PerformanceDBA, I disagree. When I learned relational modeling, I learned a definition of candidate key that agrees with the Wikipedia entry for that term, and that specifically identifies candidate key as a relational concept. — Walter Mitty, Jun 06 '15 at 11:06
@UmbraAeternitatis, It is entirely appropriate to use the concepts of entity, relation, and table in the process of thinking about a database design problem, and coming up with a satisfactory design. It can be helpful to use these concepts in separate models. In classical design (from the 1980s) conceptual modeling uses entities (and relationships), logical modeling uses relations, and physical modeling uses tables. The three concepts have a huge overlap, but are not mathematically identical. — Walter Mitty, Jun 06 '15 at 11:11
@WalterMitty. (a) The only thing the **defines** Relational is the **Relational Model**. If wiki or Data or Darwen agree with that, then it is Relational, if they don't, it isn't. There are many books that purport to be "relational" while teaching and practising the pre-1970 ISAM filing structures. (b) There is no "candidate key" in the *RM*. There is a **Primary Key** that Date, Darwen, Fagin, wiki, etc do not use. Anti-relational on two counts. — PerformanceDBA, Jun 06 '15 at 11:38
@UmbraAeternitatis. The books are intended to keep people confused, and therefore awe-struck by the authors. Buy the next book. At your level, without formal tertiary education for Conceptual, Relations, etc, forget about Entities and Conceptual. Concentrate on Tables and rows, otherwise you will be arguing for decades, without resolving anything, same as the "theoreticians". THe *RM* states that we should consider all data in a tabular format (rows and columns). — PerformanceDBA, Jun 06 '15 at 11:45
@PerformanceDBA, the worst thing is that I need it for my so-called _tertiary education_ project. But I was told at university only some uncertain things about conceptual modeling (1 - it does exists, 2 - it is good, 3 - I must do my project with conceptual modeling and a database - nothing was mentioned about _logical stage_). So I had to read such books (T.M. Connolly and C.J. Date) and was trying to understand what all that things about _MuliValued Dependencies_ and _Join Dependencies_ mean. And now it looks like I don't need them to design _good_ database. — Umbra Aeternitatis, Jun 06 '15 at 22:05
As I suppose without at least a bit knowledge of _Relational Modeling_ it is not possible to design such a _good_ database (even in MS Access). But now I feel myself confused. I need to design Data Model and implement it in DBMS (build database), but on the other hand I need to substantiate it. — Umbra Aeternitatis, Jun 06 '15 at 22:11
Shall I detete this question? It seems that the question has no straightforward answer. — Umbra Aeternitatis, Jun 07 '15 at 19:33

score 5 · Accepted Answer · edited May 23 '17 at 10:34

Position

Any practice that is not based on solid theory is not worthy of consideration. I am a strict Relational Model practitioner, with a strong grounding in the theory. The Relational Model is based on solid theory, and has never been refuted¹. There is nothing solid in what passes for "relational theory", I have taken them on, and refuted their notions in their space. Further, Relational Database design is a science, not magic, not art², therefore I can provide evidence for any of the propositions or charges that I make. My answers are from that position.

^{1. The are non-science articles, and masses of opinions from those who do not understand the science, yes, but no scientific refutation. Much like pygmies arguing that man cannot fly, it is "true" for them, but not true for mankind, it is based on a complete inability to understand the principle of flight.}

^{2. There is some art in the presentations of high-end practitioners, yes, but that does not make the science an art. It is a science, and only a science, and over and above that, it can be artfully delivered, in models and databases.}

"Relational Theory"

I wish to know how to correctly set up Primary Keys in a Relation. E.g. we have ER-diagram which contain elements:

If it was an ERD, then you wouldn't be looking at "relations", you would be looking at entities (if the diagram was early) or tables (if it were progressed). "Relations" are a wonderful abstraction which have nothing to do with an implementation. An ERD or a Data Model means an implementation (non-abstract, real) is intended, the intention to the physical leaves the abstract world of theory behind, and enters the physical world, where idiotic abstractions get destroyed.

Further the "theoreticians" who allege to be serving the database space cannot differentiate between base relations and derived relations: while that might be acceptable in the abstract context, it is dead wrong in the implementation context. Eg. base relations are tables, and they need to be Normalised; derived relations are, well, derived, views, of base relations, which by definition are flattened views (not "denormalised", which means something slightly different) of base relations. As such, they need not be Normalised.

But the "theoreticians" try to "normalise" derived relations. And the most damaged two are trying to have the definition of 1NF, that we have had for forty five years, that is fundamental and rock solid, that they themselves have supported, changed, so that their derived relations, which do not need "normalisation", can be classified as "normalised". It would be hilarious if it were not so sad.

One marvellous quality of objective truth, of science, is that it does not change. Subjective "truth", non-science, changes all the time. One can be relied upon, it must be understood before a practice is undertaken, the other is not worth reading about.

Isolation

They live in a world of their own, isolated from the reality of Relational Databases, specifically the Relational Model, and the industry that they allege to serve. In forty five years since the RM came out, they have done nothing to progress the RM or Relational databases.

Mind you, they have been progressing all sorts of notions, which are outside the Relational Model.
The progress of the RM (completion of what the Neanderthals suggest was "incomplete") has happened solely due to the standardisation (R Brown and others working with Codd, resulting in the IDEF1X Standard for Modelling Relational Databases), and the efforts of high-end SQL vendors and their customers.
That is the commercial RDBMS vendors, who were already established in the 1980's, not the Non-sql freeware/shareware/vapourware groups of the last decade, who pass off their wares as "sql", which gets you good and glued to their "platform", non-portable.

The worst part is, they publish books about their non-relational concepts, and fraudulently label them as "relational". And "professors" blindly "teach" this nonsense, like parrots, without ever understanding either the nonsense, or the Relational Model that it is supposed to explore.

If you are trying to find answers to some "educational" project, sorry, I cannot provide that, because the "education", as you can see, is totally confused, and has non-relational requirements.
I can however, provide direct answers to the question, governed by science, the Relational Model, the laws of physics, etc.

The point to take from this is, while Relational Theory and Practice were very close after Dr E F Codd published his seminal work, and during the time that the SQL Platforms were developed by the vendors, in the post-Codd era, what passes for "relational theory" is completely divorced from that original Relational Theory.

I can enumerate the differences, but not here. Note that if you read my posts that touch on this subject, you can gather those particulars, and enumerate them yourself. Or else ask a new question.

The Question

I wish to know how to correctly set up Primary Keys in a Relation. E.g. we have ER-diagram which contain elements:

There is no ERD to examine. Ok, in the Update you have an example. Perfect for your questions, because it is a set of user views of the data, and the modelling can now begin. But note, that is not an ERD or a Model. We rely on understanding the data; analysing it; classifying it, not on looking at the data values with a microscope. I realise that that is what you have been taught to do.

In order to translate it into Relational Model

Yes, that is the stated goal. The word "translate" is incorrect, because the RM is not merely a flat or fixed set of criteria that one "satisfies" or fits into (as it is known to the "theoreticians"), it also provides specific Methods and Rules. Therefore, we will be Modelling, according to the Relational Model.

we should do some tricks.

We don't need tricks, we use science, and only science. The "theoreticians" and the "professors" who follow them, need tricks, and practice non-science. I can't help in that regard. Further, the tricks they use, are usually to circumvent and subvert the Relational Model, so watch out for them.

Surrogate

All elements above deal with Primary Keys of relations but they all are Natural Keys - so we can leave them as is or replace with Surrogate Keys.

Well, there it is, your "teacher's" first trick is exposed.

Surrogates are physical Record (not row) pointers, they are not logical.
There is no such thing as a "surrogate key", the two words contradict each other.
- A Key has a specific definition in the RM, it has to be made up from the data. A surrogate isn't made up from the data, it is manufactured, a meaningless number generated by the system. Therefore it is not a Key or a "key".
- A Key in the RM has has a number of Relational qualities, which makes Keys very powerful. Since a surrogate is not a Key, it does not have any of those qualities, it has no Relational power.
- Therefore, "surrogate" and Key each have specific meanings, and they are quite fine as separate terms, but together, they are self-contradictory, because they are opposites.
- When people use them term "surrogate key", they naturally expect some, if not all, the qualities of a Key. But they will not obtain any of them. Therefore they are defrauded.
The Relational Model (the one that the theoreticians know nothing about) has a specific Access Path Independence Rule. As long as Relational Keys are used, this rule is maintained. It provides Relational Integrity¹.
- The use of a surrogate violates this rule. The consequence² is, Relational Integrity and Relational Navigation³ are both lost.
- The consequence of that is, many more joins are required to get at the same data (not less, as the lovers of mythology and magic keep parroting).
- Therefore surrogates are not permitted, on another, separate count.
Since you are in the modelling stage, either conceptual or logical, and Keys are Logical, and surrogates are physical, surrogates should not come into the picture. (They come into the picture, if at all, for consideration, only when the logical model is complete, and the physical model is being considered.) You are nowhere near completion of the Logical, so the introduction of a surrogate should raise a red flag.

The "teacher", and the author of the "textbook" that he is using, are frauds, on two separate counts:
- They are introducing a physical field, into the Logical exercise, which should not concern itself with physical aspects of the database.
- But in so doing, the effect they have is that they establish the surrogate, the physical thing, as a logical thing. Thus they poison the mind.

There, straight science, pure logic, uncontaminated by insane thinking, and thus immune to the frauds. No surrogates at the Logical stage.

^{1. Relational Integrity (which the Relational Model provides) is distinctly different to Referential Integrity (which SQL provides, and Record Filing Systems might have). If you do not understand this, please open a new question "What is the difference ..." and ping me.}

^{2. Breaking any rule has always has undesirable consequences, beyond the act itself.}

^{3. If you do not understand this, please open a new question "What is the Relational Navigation ..." and ping me.}

So the final answer to your question:

All elements above deal with Primary Keys of relations but they all are Natural Keys - so we can leave them as is or replace with Surrogate Keys.

In the conceptual and logical exercise, we deal with Logical Keys only. Physical concepts such as a surrogate are illegal. The replacement of a Logical Key with a physical creature, in the Logical exercise is rejected. Use the Keys you have, which are from the data, and natural.

Not a "Replacement"

There is one more point. The term "replacement" is incorrect. A surrogate is never a replacement or substitute for a Natural Key.

One of the many qualities that a natural Key provides, is row uniqueness, and that too, is demanded in the Relational Model, duplicate rows are not permitted.
Since a surrogate is not a Key to a row (it is a physical pointer to a record), it cannot provide the required row uniqueness. If you do not fully understand what I am saying, please read this Answer, from the top to False Teachers. Do test the given code exercises.
Therefore, a surrogate, even if considered, at the physical modelling stage, is always an additional column and index. It is not a replacement for a natural Relational Key.
And conversely, if the surrogate is implemented as a replacement, the consequence is duplicate rows, a non-relational file, not a Relational table.

Case 1

Key Attribute is a name - so it must be of type CHAR or VARCHAR. Generally names become Key Attributes.

Yes.

Often they are codes (users do use codes). Often Codes jump out at you (you have a very good example in your One More Update). { D | R | B } would do just as well { < | ^ | > }. This is of course towards the end of the logical model stage, when the model is stable, and one is finalising the Keys and optimising them. For any stage earlier than that, the wide Natural Keys stand.

The idea is to keep it meaningful.

Keys have meaning (surrogates have no meaning). One of the qualities of a Relational Key is, that that meaning is carried, wherever the Key is migrated as a Foreign Key.

And as per your example, wherever it is used. Including program code. Writing:

 IF CrewType = "Backup"  -- meaningful but fixes a value
 IF CrewType = 1         -- meaningless

is just plain wrong. Because (a) that is not really a Key, and (b) the user may well change the value of that datum from Backup to Reserve, etc. Never write code that addresses a data value, a descriptor. So the fact is, Backup is the projection of the Key, the exposition, and the code is the Key. That resolves to CrewType.Name, and the Key is CrewTypeCode.

     IF CrewTypeCode = "B"   -- Key, meaningful, not fixed

While we are on Keys, please note:

In the Relational Model, we have Primary Keys, Alternate Keys, and Foreign Keys (migrated Primary Keys).
We do not have "candidate keys", no such thing is defined in the RM. It is something manufactured outside the RM. It is therefore non-relational.

Worse, they are used by people who implement surrogates as "primary keys"^a.
A physical consideration ^b, but one that should be understood and applied throughout the exercise. When the data is understood and known, the columns will be fixed length. When they are unknown, they might be variable. For Keys, given that they will be indexed, at least on the Primary side, they should never be variable, because that requires unpacking on every access.

^{a. The use the SQL keyword PRIMARY KEY does not magically transform a surrogate into a PK. If one follows the RM, one (a) determines the possible Keys (no surrogates), and then (b) chooses one as Primary, which (c) means the election is over, therefore (d) the nominated candidates can no longer be called "candidates", the event is history, therefore (e) the remainder, the non-primary Keys, are Alternate Keys.}

^{"Candidate key" is a refusal to conform to the RM and nominate a PK, therefore, in and of itself, it is non-relational. Separate to the fact that they have a surrogate as "primary key", which is a second non-relational item.}

^{b. For those non-technical people who believe that no technical knowledge and foresight, no physical considerations at all, should be evaluated during the logical, that's fine, evaluate them at the physical. Since I am not addressing the physical here, I am just making a note for Umbra.}

Magicians rely on their tricks, to make bunny rabbits look like lions. Scientists do not need them.

Case 2

Two (or more) Identifying Relationships become a Composite Primary Key of a relation (which is made of Foreign Keys).

I think you have the right idea, but the wording is incorrect for the generic case.

That wording is correct for an Associative Table, which has two Foreign Keys. Yes, in that case, the two FKs form the PK, which is all that is needed for row uniqueness. Nothing can better that. The addition of a Record ID is superfluous.
For the generic case, for any table:
- An Identifying Relationship¹ causes the FK (migrated parent PK) to be part of the PK in the child. Hence the name, the parent Identifies the child.
- That makes the child a Dependent¹ table, meaning that the child rows can exist only in the context of a parent row. Such tables form the intermediate and leaf nodes in the Data Hierarchies, they are the majority of tables in a Relational database.
- If the row can exist independently, the table is Independent¹. Such tables form the top of each Data Hierarchy, there are very few in a Relational database.
- A Non-identifying Relationship¹ is one where the FK (migrated parent PK), is not used to form the child PK.
- Compound or Composite Keys are simply made up of more than one column, they are standard fare in Relational databases. Every table except the top of each Data Hierarchy will have a Compound Key. If you do not have any, the database is not Relational.

Please read my IDEF1X Introduction carefully.

^{1. The "theoreticians" do not differentiate Identifying vs Non-identifying, or Dependent vs Independent: all their files are Independent; all their "relationships" between record pointers are Non-identifying. It is a regression to the pre-1970's ISAM Record Filing Systems, devoid of Relational Integrity, power, and speed. That is all they understand, that is all they can teach. Fraudulently labelled as "relational".}

Case 3

Identifying Relationship(s) with Weak Key Attribute(s) also become a Composite Primary Key.

The term "weak" with or without a relationship to "key" is not defined in the Relational Model. It is a fiction of the "theoreticians". Thus I cannot answer that question.

I do note that some of the "theoretical" papers present strong Keys (normal English word, describing the fact that the Key has been established previously) as "weak", and weak "keys" (normal English word, describing the fact that the "key" has not been established previously) as "strong". Such is the nature of schizophrenia.
Therefore I suspect that it is part and parcel of their evidenced attempt to confuse the science with non-science, and to undermine the Relational Model. In the old days, when such people were locked up, humanity was healthly. Now they write books and teach in colleges.

Case 4

Associative entities usually have two or more Identifying Relationships

Yes. Two is correct.

If you have more than two, then that is not fully Normalised. Codd gives an explicit method to Normalise that, such that there will be two (or more) Associative entities, of two exactly Identifying relationships each.

"... therefore, all n-ary (more than two) relations ... can be ... and should be, resolved to binary (two) relations."
(paraphrased for this context)

so they are to be Junction Relations (Junction Tables).

No. "Junction" relations and "junction" tables are not defined in the Relational Model, therefore they are non-relational.

Associative Entities in the logical become Associative Tables in the physical.

Answer Too Long

The completion of the answer exceeded the limit for SO answers. Therefore I have placed the Answer in a single document, and provided a link. Splitting the Answer at this point proved to be a sin, thus the document contains the entire answer, with consistent formatting, etc:

Complete Answer

To continue from this point (ie. the SO Answer text, above), simply scroll down to the Case 4 heading.
There is a value in retaining the above SO Answer text, not only for historical purposes, but for text searches, etc.

@UmbraAeternitatis. *As I suppose without at least a bit knowledge of Relational Modeling it is not possible to design such a good database (even in MS Access).* Yes, you need the basics. *But now I feel myself confused. I need to design Data Model and implement it in DBMS (build database), but on the other hand I need to substantiate it* Ok. I can give you unconfused science, and a complete answer. Half delivered above, the rest tomorrow. Find Codd's *Relational Model*, that plus my post, is your substantiation. Beware, the confused keep everyone else confused. — PerformanceDBA, Jun 12 '15 at 13:19
I have read some other of your answers on StackOverflow, that's why I asked myself: "Am I doing right? Perhaps, I did it wrong from the begining...". As far as I understood, _Relational Model_ is a self-sufficient method to construct semantics of the subject area, isn't it? So ERD is another approach that has been derived from _Relational Model_ which supposed to ease perception of the last one (because it is too abstract) with colorful objects of different shapes? And probably we (and me in particular) should use only one of the methods to construct the data semantics. — Umbra Aeternitatis, Jun 14 '15 at 00:43
I've been reading about E. F. Codd a bit and about his **"Codd's twelve rules"** which are a heavy impact on commercial DBMS manufacturers, 'cause practically all of them do not match all the 13 points of the **"Rules"**. So another question - how to implement Relational Model in Non-Relational DBMSs? — Umbra Aeternitatis, Jun 14 '15 at 00:50
I think I should really study the original work of the man, who brought the term **"Relational Model"** into the world before I ask another question. It seems that the subject requires deeper study. — Umbra Aeternitatis, Jun 14 '15 at 01:00
@Umbra Aeternitatis. (a) I have completed the Answer, refer to the notes at the end. Enjoy. (b) I will answer your comments as time permits. (c) The *Codd's Twelve Rules* doc is ancient, I should update it. I will notify you when that is done. (d) Although Codd wrote those rules in the 1980's, to counter the fraudulent declarations of DBMS vendors then, and the vendors did upgrade their wares to various degrees, it applies just as much today, due to the raft of NONsql vendors and their fraudulent declarations re (i) the *Relational Model*, and (ii) SQL. — PerformanceDBA, Jun 15 '15 at 02:04
Wow. Awesome explanation. Even more than I expected. My question looks somewhat confused, but explanation clafifies many points of it. — Umbra Aeternitatis, Jun 17 '15 at 01:15
@PerformanceDBA. Great answer. Could you take a look at this question: http://stackoverflow.com/questions/30962601/how-to-model-multilingual-entities-in-relational-databases — dzhu, Jun 21 '15 at 11:54
@PerformanceDBA. I asked a question re **Relational Integrity**. See http://stackoverflow.com/questions/31018623/what-is-relational-integrity — dzhu, Jun 24 '15 at 05:55

score 0 · Answer 2 · answered Jun 06 '15 at 11:30

Your list of the advantages and disadvantages of using surrogate keys is a good one. As that list suggests, this topic is a complex one. And there is no uniform consensus among database designers about when surrogate keys are indicated or contraindicated. Even in this Q&A area, you will find wildly varying opinions on this subject.

I quibble with your listing of "meaningless" as a disadvantage to using surrogate keys. In many circumstances, the fact that they are meaningless is an advantage, not a disadvantage. In particular, many natural keys invented by people are not "atomic". That is, they contain multiple attributes encoded inside the key.

For example, is it is possible, given the VIN (Vehicle Identification Number) of a vehicle, to determine the passenger seating capacity of the vehicle. But that's the seating capacity as originally manufactured, and not necessarily the seating capacity at the present point in time. Since the VIN ought to be immutable, it can't be changed when one seat is ripped out. And it's now misleading.

So many teachers of database design advocate meaningless keys.

There are a couple of disadvantages of names used as natural keys that you did not mention. They are often not unique, and they are often mutable, as used by humans. For example, there could be two students named Mary Jones at a university. Or Mary Jones might change her name to Mary Smith, part way through a semester.

There is another disadvantage you didn't mention. It's misdirected data, including fraud. If SSN is used to identify employees, we have to guard against an employee giving us a fake SSN, and then later hiring a person that really owns that SSN. The database is in real trouble at that point.

This answer only touches on a few aspects of a very large topic. I suggest further reading, from authors like CJ Date.

Indeed _Natural Key_ may be non-atomic, but if we choose some attribute of the _Entity_ as _Key Attribute_ then it **have to be** unique to identify the Entity from the whole _set of Entities_ (_Entity Type_). As for the names of the space vehicles - they can be used as unique identificators `case they are specially made to uniquely identify, e.g. _Soyuz TMA-15M_ is made up of ship class name (Soyuz), series type (TMA-M) and number of vehicle in the series (15). Names of persons initially can't be used as unique identifiers. E.g. I can devise any name I wish within IT forum... — Umbra Aeternitatis, Jun 06 '15 at 15:24
But I will be still unique (or my account Id). In this case we initially suppose that Id's are used to identify the _Entity_ (even in ER-Diagram) this because of the fact we can't rely on UserName (which can vary). _Surrogate Keys_ as I think of them - should appear in the _logical model_ (not in conceptual) due to differences between the _Concept_ and the _Logic_. — Umbra Aeternitatis, Jun 06 '15 at 15:32
Still it is not clear to me when it is reasonable to shift _Keys_ that are unique with another _Keys_ that are unique too. Perhaps I'm getting it wrong and this is not the question of _Logic_ but the question of the physical implementation in the particular DBMS? But it is not refused yet and I need to how to design and implement in better way. — Umbra Aeternitatis, Jun 06 '15 at 15:43
I guess the problem I was raising was not uniqueness but immutability. If a primary key is mutable, then all references to it have to mutate at the same time, or else the linkage will be broken. In some cases, it's possible to specify that updates or deletes cascade. But in other cases, that doesn't find all the references. For example, if there are references that have been copied to extracts, then you may be left with orphans. — Walter Mitty, Jun 06 '15 at 16:40
The question of when to substitute another key when two candidate keys (yes, candidate keys) are equally logically correct is a subtle one. There are physical aspects to the decision, and "real world" considerations, and everything in between. — Walter Mitty, Jun 06 '15 at 16:42

How to set up Primary Keys in a Relation?

Case 1

Case 2

Case 3

Case 4

Advantages

Disadvantages

Update

One more update

2 Answers2

Position

"Relational Theory"

Isolation

The Question

Surrogate

Not a "Replacement"

Case 1

Case 2

Case 3

Case 4

Answer Too Long

Linked