Position
Any practice that is not based on solid theory is not worthy of consideration. I am a strict Relational Model practitioner, with a strong grounding in the theory. The Relational Model is based on solid theory, and has never been refuted 1. There is nothing solid in what passes for "relational theory", I have taken them on, and refuted their notions in their space. Further, Relational Database design is a science, not magic, not art 2, therefore I can provide evidence for any of the propositions or charges that I make. My answers are from that position.
1. The are non-science articles, and masses of opinions from those who do not understand the science, yes, but no scientific refutation. Much like pygmies arguing that man cannot fly, it is "true" for them, but not true for mankind, it is based on a complete inability to understand the principle of flight.
2. There is some art in the presentations of high-end practitioners, yes, but that does not make the science an art. It is a science, and only a science, and over and above that, it can be artfully delivered, in models and databases.
"Relational Theory"
I wish to know how to correctly set up Primary Keys in a Relation. E.g. we have ER-diagram which contain elements:
If it was an ERD, then you wouldn't be looking at "relations", you would be looking at entities (if the diagram was early) or tables (if it were progressed). "Relations" are a wonderful abstraction which have nothing to do with an implementation. An ERD or a Data Model means an implementation (non-abstract, real) is intended, the intention to the physical leaves the abstract world of theory behind, and enters the physical world, where idiotic abstractions get destroyed.
Further the "theoreticians" who allege to be serving the database space cannot differentiate between base relations and derived relations: while that might be acceptable in the abstract context, it is dead wrong in the implementation context. Eg. base relations are tables, and they need to be Normalised; derived relations are, well, derived, views, of base relations, which by definition are flattened views (not "denormalised", which means something slightly different) of base relations. As such, they need not be Normalised.
- But the "theoreticians" try to "normalise" derived relations. And the most damaged two are trying to have the definition of 1NF, that we have had for forty five years, that is fundamental and rock solid, that they themselves have supported, changed, so that their derived relations, which do not need "normalisation", can be classified as "normalised". It would be hilarious if it were not so sad.
One marvellous quality of objective truth, of science, is that it does not change. Subjective "truth", non-science, changes all the time. One can be relied upon, it must be understood before a practice is undertaken, the other is not worth reading about.
Isolation
They live in a world of their own, isolated from the reality of Relational Databases, specifically the Relational Model, and the industry that they allege to serve. In forty five years since the RM came out, they have done nothing to progress the RM or Relational databases.
Mind you, they have been progressing all sorts of notions, which are outside the Relational Model.
The progress of the RM (completion of what the Neanderthals suggest was "incomplete") has happened solely due to the standardisation (R Brown and others working with Codd, resulting in the IDEF1X Standard for Modelling Relational Databases), and the efforts of high-end SQL vendors and their customers.
That is the commercial RDBMS vendors, who were already established in the 1980's, not the Non-sql freeware/shareware/vapourware groups of the last decade, who pass off their wares as "sql", which gets you good and glued to their "platform", non-portable.
The worst part is, they publish books about their non-relational concepts, and fraudulently label them as "relational". And "professors" blindly "teach" this nonsense, like parrots, without ever understanding either the nonsense, or the Relational Model that it is supposed to explore.
If you are trying to find answers to some "educational" project, sorry, I cannot provide that, because the "education", as you can see, is totally confused, and has non-relational requirements.
I can however, provide direct answers to the question, governed by science, the Relational Model, the laws of physics, etc.
The point to take from this is, while Relational Theory and Practice were very close after Dr E F Codd published his seminal work, and during the time that the SQL Platforms were developed by the vendors, in the post-Codd era, what passes for "relational theory" is completely divorced from that original Relational Theory.
- I can enumerate the differences, but not here. Note that if you read my posts that touch on this subject, you can gather those particulars, and enumerate them yourself. Or else ask a new question.
The Question
I wish to know how to correctly set up Primary Keys in a Relation. E.g. we have ER-diagram which contain elements:
There is no ERD to examine. Ok, in the Update you have an example. Perfect for your questions, because it is a set of user views of the data, and the modelling can now begin. But note, that is not an ERD or a Model. We rely on understanding the data; analysing it; classifying it, not on looking at the data values with a microscope. I realise that that is what you have been taught to do.
In order to translate it into Relational Model
Yes, that is the stated goal. The word "translate" is incorrect, because the RM is not merely a flat or fixed set of criteria that one "satisfies" or fits into (as it is known to the "theoreticians"), it also provides specific Methods and Rules. Therefore, we will be Modelling, according to the Relational Model.
we should do some tricks.
We don't need tricks, we use science, and only science. The "theoreticians" and the "professors" who follow them, need tricks, and practice non-science. I can't help in that regard. Further, the tricks they use, are usually to circumvent and subvert the Relational Model, so watch out for them.
Surrogate
All elements above deal with Primary Keys of relations but they all are Natural Keys - so we can leave them as is or replace with Surrogate Keys.
Well, there it is, your "teacher's" first trick is exposed.
Surrogates are physical Record (not row) pointers, they are not logical.
There is no such thing as a "surrogate key", the two words contradict each other.
A Key has a specific definition in the RM, it has to be made up from the data. A surrogate isn't made up from the data, it is manufactured, a meaningless number generated by the system. Therefore it is not a Key or a "key".
A Key in the RM has has a number of Relational qualities, which makes Keys very powerful. Since a surrogate is not a Key, it does not have any of those qualities, it has no Relational power.
Therefore, "surrogate" and Key each have specific meanings, and they are quite fine as separate terms, but together, they are self-contradictory, because they are opposites.
When people use them term "surrogate key", they naturally expect some, if not all, the qualities of a Key. But they will not obtain any of them. Therefore they are defrauded.
The Relational Model (the one that the theoreticians know nothing about) has a specific Access Path Independence Rule. As long as Relational Keys are used, this rule is maintained. It provides Relational Integrity 1.
The use of a surrogate violates this rule. The consequence 2 is, Relational Integrity and Relational Navigation 3 are both lost.
The consequence of that is, many more joins are required to get at the same data (not less, as the lovers of mythology and magic keep parroting).
Therefore surrogates are not permitted, on another, separate count.
Since you are in the modelling stage, either conceptual or logical, and Keys are Logical, and surrogates are physical, surrogates should not come into the picture. (They come into the picture, if at all, for consideration, only when the logical model is complete, and the physical model is being considered.) You are nowhere near completion of the Logical, so the introduction of a surrogate should raise a red flag.
The "teacher", and the author of the "textbook" that he is using, are frauds, on two separate counts:
They are introducing a physical field, into the Logical exercise, which should not concern itself with physical aspects of the database.
But in so doing, the effect they have is that they establish the surrogate, the physical thing, as a logical thing. Thus they poison the mind.
There, straight science, pure logic, uncontaminated by insane thinking, and thus immune to the frauds. No surrogates at the Logical stage.
1. Relational Integrity (which the Relational Model provides) is distinctly different to Referential Integrity (which SQL provides, and Record Filing Systems might have). If you do not understand this, please open a new question "What is the difference ..." and ping me.
2. Breaking any rule has always has undesirable consequences, beyond the act itself.
3. If you do not understand this, please open a new question "What is the Relational Navigation ..." and ping me.
So the final answer to your question:
All elements above deal with Primary Keys of relations but they all are Natural Keys - so we can leave them as is or replace with Surrogate Keys.
In the conceptual and logical exercise, we deal with Logical Keys only. Physical concepts such as a surrogate are illegal. The replacement of a Logical Key with a physical creature, in the Logical exercise is rejected. Use the Keys you have, which are from the data, and natural.
Not a "Replacement"
There is one more point. The term "replacement" is incorrect. A surrogate is never a replacement or substitute for a Natural Key.
One of the many qualities that a natural Key provides, is row uniqueness, and that too, is demanded in the Relational Model, duplicate rows are not permitted.
Since a surrogate is not a Key to a row (it is a physical pointer to a record), it cannot provide the required row uniqueness. If you do not fully understand what I am saying, please read this Answer, from the top to False Teachers. Do test the given code exercises.
Therefore, a surrogate, even if considered, at the physical modelling stage, is always an additional column and index. It is not a replacement for a natural Relational Key.
And conversely, if the surrogate is implemented as a replacement, the consequence is duplicate rows, a non-relational file, not a Relational table.
Case 1
Key Attribute is a name - so it must be of type CHAR or VARCHAR. Generally names become Key Attributes.
Yes.
Often they are codes (users do use codes). Often Codes jump out at you (you have a very good example in your One More Update). { D | R | B } would do just as well { < | ^ | > }. This is of course towards the end of the logical model stage, when the model is stable, and one is finalising the Keys and optimising them. For any stage earlier than that, the wide Natural Keys stand.
The idea is to keep it meaningful.
Keys have meaning (surrogates have no meaning). One of the qualities of a Relational Key is, that that meaning is carried, wherever the Key is migrated as a Foreign Key.
And as per your example, wherever it is used. Including program code. Writing:
IF CrewType = "Backup" -- meaningful but fixes a value
IF CrewType = 1 -- meaningless
is just plain wrong. Because (a) that is not really a Key, and (b) the user may well change the value of that datum from Backup
to Reserve
, etc. Never write code that addresses a data value, a descriptor. So the fact is, Backup
is the projection of the Key, the exposition, and the code is the Key. That resolves to CrewType.Name, and the Key is CrewTypeCode.
IF CrewTypeCode = "B" -- Key, meaningful, not fixed
While we are on Keys, please note:
In the Relational Model, we have Primary Keys, Alternate Keys, and Foreign Keys (migrated Primary Keys).
We do not have "candidate keys", no such thing is defined in the RM. It is something manufactured outside the RM. It is therefore non-relational.
Worse, they are used by people who implement surrogates as "primary keys" a.
A physical consideration b, but one that should be understood and applied throughout the exercise. When the data is understood and known, the columns will be fixed length. When they are unknown, they might be variable. For Keys, given that they will be indexed, at least on the Primary side, they should never be variable, because that requires unpacking on every access.
a. The use the SQL keyword PRIMARY KEY
does not magically transform a surrogate into a PK. If one follows the RM, one (a) determines the possible Keys (no surrogates), and then (b) chooses one as Primary, which (c) means the election is over, therefore (d) the nominated candidates can no longer be called "candidates", the event is history, therefore (e) the remainder, the non-primary Keys, are Alternate Keys.
"Candidate key" is a refusal to conform to the RM and nominate a PK, therefore, in and of itself, it is non-relational. Separate to the fact that they have a surrogate as "primary key", which is a second non-relational item.
b. For those non-technical people who believe that no technical knowledge and foresight, no physical considerations at all, should be evaluated during the logical, that's fine, evaluate them at the physical. Since I am not addressing the physical here, I am just making a note for Umbra.
Magicians rely on their tricks, to make bunny rabbits look like lions. Scientists do not need them.
Case 2
Two (or more) Identifying Relationships become a Composite Primary Key of a relation (which is made of Foreign Keys).
I think you have the right idea, but the wording is incorrect for the generic case.
That wording is correct for an Associative Table, which has two Foreign Keys. Yes, in that case, the two FKs form the PK, which is all that is needed for row uniqueness. Nothing can better that. The addition of a Record ID is superfluous.
For the generic case, for any table:
An Identifying Relationship 1 causes the FK (migrated parent PK) to be part of the PK in the child. Hence the name, the parent Identifies the child.
That makes the child a Dependent 1 table, meaning that the child rows can exist only in the context of a parent row. Such tables form the intermediate and leaf nodes in the Data Hierarchies, they are the majority of tables in a Relational database.
If the row can exist independently, the table is Independent 1. Such tables form the top of each Data Hierarchy, there are very few in a Relational database.
A Non-identifying Relationship 1 is one where the FK (migrated parent PK), is not used to form the child PK.
Compound or Composite Keys are simply made up of more than one column, they are standard fare in Relational databases. Every table except the top of each Data Hierarchy will have a Compound Key. If you do not have any, the database is not Relational.
Please read my IDEF1X Introduction carefully.
1. The "theoreticians" do not differentiate Identifying vs Non-identifying, or Dependent vs Independent: all their files are Independent; all their "relationships" between record pointers are Non-identifying. It is a regression to the pre-1970's ISAM Record Filing Systems, devoid of Relational Integrity, power, and speed. That is all they understand, that is all they can teach. Fraudulently labelled as "relational".
Case 3
Identifying Relationship(s) with Weak Key Attribute(s) also become a Composite Primary Key.
The term "weak" with or without a relationship to "key" is not defined in the Relational Model. It is a fiction of the "theoreticians". Thus I cannot answer that question.
I do note that some of the "theoretical" papers present strong Keys (normal English word, describing the fact that the Key has been established previously) as "weak", and weak "keys" (normal English word, describing the fact that the "key" has not been established previously) as "strong". Such is the nature of schizophrenia.
Therefore I suspect that it is part and parcel of their evidenced attempt to confuse the science with non-science, and to undermine the Relational Model. In the old days, when such people were locked up, humanity was healthly. Now they write books and teach in colleges.
Case 4
Associative entities usually have two or more Identifying Relationships
Yes. Two is correct.
If you have more than two, then that is not fully Normalised. Codd gives an explicit method to Normalise that, such that there will be two (or more) Associative entities, of two exactly Identifying relationships each.
- "... therefore, all n-ary (more than two) relations ... can be ... and should be, resolved to binary (two) relations."
(paraphrased for this context)
so they are to be Junction Relations (Junction Tables).
No. "Junction" relations and "junction" tables are not defined in the Relational Model, therefore they are non-relational.
Associative Entities in the logical become Associative Tables in the physical.
Answer Too Long
The completion of the answer exceeded the limit for SO answers. Therefore I have placed the Answer in a single document, and provided a link. Splitting the Answer at this point proved to be a sin, thus the document contains the entire answer, with consistent formatting, etc:
Complete Answer
To continue from this point (ie. the SO Answer text, above), simply scroll down to the Case 4 heading.
There is a value in retaining the above SO Answer text, not only for historical purposes, but for text searches, etc.