Does adding "ID" field to database tables according to the third normal form considered a mistake?

Question

Is it OK to add "ID" field as primary key to all my database tables also to use it to make relations ships between tables? Is this design would consider as 3NF (third normal form) design? If yes, is this thing recommended theoretically or not?

score 7 · Accepted Answer · edited May 23 '17 at 12:31

The problem is the question is a bit isolated. But since you are concerned about whether the issue has a theoretical (and possibly a standards-compliance) basis, the answer has to not be so isolated.

If yes, is this thing recommended theoretically or not?

No. It has no academic or theoretical basis whatsoever. It breaks basic Relational Database design rules, and therefore (a) will not produce a Relational database and (b) whatever is produced, will not have the power of relational capability that the users (without having to go through the app) will expect via the many simple Relational database report tools out there.

In fact, it is the unfortunately very common, knee-jerk, quick-and-dirty method of making spreadsheets (which the app developer has identified for their app) fit into a "database" container such as MS SQL. Without doing any of the genuine database design or modelling work that is required for the contents of the container to qualify as a Relational database. Good for getting a prototype or proof of concept going but not ready for any form of development (SQL coding).

Is it OK to add "ID" field as primary key to all my database tables also to use it to make relationships between tables?

Hang on. They cannot be "database tables", by definition. Database tables are arrived at by a formal modelling process, and as a result will have strong Identifiers. And Relations already defined. In which case the question would not be asked. Therefore, since it is being asked, the things you are asking about are nowhere near "database tables". They are just one app developers scratchpad for one app.

Adding an FK constraint to one spreadsheet, to "relate" it to another spreadsheet, and adding an "ID" PK, does not make a "relational" "database". No, it merely uses the capability of SQL to relate otherwise unrelated spreadsheets in the container. They remain unrelated spreadsheets, "linked" by an added "ID" column.

The result is, substantial duplication of the data; update anomalies; many more indices; larger "relational" data sets; poor performance; massive over-use of temp tables; complex SQL, all of which can be avoided by genuine database design.

Is this design would consider as 3NF (third normal form) design?

Normalisation is part of (not all of) the database design process. 3NF is arrived at through that process. 3NF or whatever NF, is not a label that can be placed on the set of spreadsheets or partially designed contents of the container, without going through the process, and thus earning the badge. One does not "consider" a bunch of spreadsheets or partly designed contents 3NF; one evaluates if the rules of Normalisation have been followed, and if the rules are not violated, then it is fairly labelled 3NF. Since the Normalisation process has not been followed, there is no basis to suggest that it could possibly relate to any Normal Form.

Likewise, over and above Normalisation, if Relational Database rules have been followed during the process, and not violated, one achieves compliance with Relational Database standards. Since Relational Database methodology has not been followed, there is no basis to suggest that it could possibly relate to any Relational Database Standard, or that any Relational capability can be expected from it.

Understanding the Whole Issue

"IDs" are surrogate keys. Surrogate keys are always (you are right) an additional key and Index, additional to the pre-existing PK, which is about to be usurped. Of course, that has a considerable performance cost, on every access.

Some questioners have the idea that Surrogate keys can be used in substitution of the PK. Which of course, is false, and you realise that, so gratefully that does not have to be addressed here.

The notion of "all surrogate keys" or "no surrogate keys" is the kind of black-or-white, all-or-nothing, nonsense that is normal for children but unacceptable in full grown adults, specially anyone doing IT work, which requires precision and understanding. It is quie normal for a small child to think "if daddy doesn't let me do what I want, he doesn't love me", and therefore "if he doesn't love me, he hates me". Most of us realise that life is a tiny bit more complex than that by the age of primary school. Developers who "like" to see" IDs" on every table and "dislike" the lack of them on some tables, are by definition incapable of considering the database as a whole, and the needs of other developers and users; they are only thinking about simplistic, one-table-at-a-time code.

It is also not about shades of grey, or blurred definitions. No, the definitions have not changed in 30 years (they have been extended and made more precise, but they have not changed). Shades of grey allows developers to avoid compliance and standards. So that too, is not recommended.

What is a genuine Relational Database ?

The truth is, if a database was honestly modelled and designed, by a qualified data modeller, using the methodologies that have been available for 30 years, they would end up with a genuinely Relational database. And if they did not follow the process, it would neither Relational nor a Database. The Identifiers and Relations would already be defined, and the meaning, the context, of that would be carried through to various tables. The data would be Normalised, to 3NF or BCNF or 5NF, and there would be no update Anomalies. In the last step, as part of the formal process, and not outside it, when translating the logical to the physical, the modeller may improve the performance of some Identifiers by adding Surrogate keys, and avoiding carrying large (wide) keys into the related child tables (1). That proves again, from another approach, why the notion of zero surrogates, or all surrogates, is childish and completely removed from the genuine process.

The genuine Relational database will have full Relational capability, honest achievement of 3NF, use natural Relational keys, with a definite few being thoughtfully switched to Surrogates.

Easily Proved

Of course, everything I have stated can be proved easily: simply post DDL of 5 to 10 of your spreadsheets, I need a "depth" of at least four (great.grand.parent⇢grand.parent⇢parent⇢child).

You may be interested, I have recently posted information re your question, in a related question, which I am not repeating here.

Note

This is only required because the current SQL offerings do not support the full Relational Model, and to eliminate known performance obstacles that they have. And it will not be necessary if and when suppliers provide Relational databases in which wide Relational keys perform as well as narrow ones.
I agree with Erwin's statements re keys and Identifiers, and thus I have not repeated them in my answer.

score 6 · Answer 2 · edited Dec 22 '10 at 13:54

6

"Is it OK to add "ID" field as primary key to all my database tables also to use it to make relations ships between tables?"

You clearly intend to add surrogate ID's everywhere, blindly and without any thinking. To think that is OK is as silly as doing it. "Good" identifiers have the properties of uniqueness (otherwise it would not be much of an identifier, of obviously), stability (their values change only infrequently), and of familiarity (their values denote something meaningful in the user's world - the world outside the IT system).

Note that I used the word "identifiers" instead of "keys" very deliberately. Keys have the property of uniqueness by definition. Therefore, all keys are a valid candidate for acting as an identifier. Which key you actually choose for acting as an identifier, should depend on how much or how little some particular key also satisfies the criteria of stability and familiarity.

Natural keys might not satisfy the stability criterion enough (but the extent to which they do is usually vastly over-stressed, typically by the kind of developers who think too little about the user's side and too much about their own side of the problem). System-generated ID's violate the "familiarity" criterion with absolute certainty.

These considerations should be sufficient to prove which way the balance should mostly go when trading off one for the other.

"Is this design would consider as 3NF (third normal form) design? If yes, is this thing recommended theoretically or not?"

If you "add an ID column to an existing design", then this will not influence the NF. Whatever NF your existing design was in, the design with the ID's added will have the same NF.

edited Dec 22 '10 at 13:54

Tomislav Nakic-Alfirevic

10,017
5
38
51

answered Nov 08 '10 at 10:14

Erwin Smout

18,113
4
33
52

"and of familiarity (their values denote something meaningful in the user's world - the world outside the IT system)." - why do you think this is an important feature of an identifier? – Tomislav Nakic-Alfirevic Dec 22 '10 at 13:53
1

@Tomislav, if he still gets to see this. Keeping natural keys as the main identifiers in a database is important because doing so has the beneficial effect on the source code that an extra "translation" step is eliminated (translation between the user's "natural" system of identifier values, such as SSN, passport ID, car chassis nbr, ... and the DB's system of surrogate ID's). Eliminating this has two beneficial effects : (a) improves readability, mostly because less joins will be needed (what gets joined to what is often hard to track in complex queries) and (b) reduces the error potential. – Erwin Smout Nov 06 '11 at 22:42
3FN states that no transitive dependencies to the key through a non key value should exist. if ID->NATURALKEY->OTHER_COLUMN then you are violating 3NF. It's OK to denormalize knowingly here and there for performance or simplicity, but not everywhere as a blind rule. – Tulains Córdova Aug 21 '14 at 21:33
@user1598390 No, in 3nf every non-prime attribute is non-transitively (ie directly) dependent on a key. Transitive (ie indirect) dependencies are irrelevant. – philipxy Sep 01 '14 at 11:42
@philipxy Isn't that the same I said ? – Tulains Córdova Sep 01 '14 at 14:36
@user1598390 No, because a there being a non-transitive (direct) dependency is different from there not being a transitive (indirect) dependency. A given sequence of FDs is either non-transitive or transitive. But when there is a non-transitive one (which matters to 3nf) there can be nevertheless be transitive ones (which don't). See a textbook for the definition of 3nf. – philipxy Sep 01 '14 at 22:46

score 2 · Answer 3 · answered Nov 08 '10 at 08:29

Normal forms are concerned with dependencies between attributes. Without knowing what dependencies you intended to represent in your table we can't say whether it would satisfy any particular normal form.

If you are talking about a surrogate key (a key that has no meaning in the business domain) then for most purposes the important point is that such a key should not be the ONLY key of any table. You should normally have a natural key (AKA business key) as well to ensure that data is not duplicated.

score 0 · Answer 4 · answered Nov 07 '10 at 23:25

If I understand you correctly,

Yes, adding a serial ID to a table and letting that ID be the primary key to which you refer to rows of that table is in general a good design question. Whether or not this violates 3NF: it does not violate 3NF but it does not guarantee it either.

Practically, adding a serial ID and using that value internally can have its advantages. One is that you are in control of the ID, whereas an externally generated key might be changed suddenly by another party. On the other hand, exporting an ID to other parties "binds" that key since changing it on your side may have impact on others use of that key. Also a serial number is often easy to forge and may collide with other peoples use of the number.

Also in practical DB design, 3NF or Boyce-Codd tends to be theoretical ideas you aspire to rather than blindly follow. Selective denormalization is a known trick to speed up some queries by making data sit closer.

I see I was talking about a different scenario than what the original poster had. Blindly adding an Id column benefits no-one if there is already a perfectly good PK in the data. I was merely trying to suggest that you evaluate the existing PKs for their "roboustness". — I GIVE CRAP ANSWERS, Nov 10 '10 at 13:33

score -3 · Answer 5 · answered Nov 08 '10 at 08:00

-3

I fully agree with @jlouis that

3NF or Boyce-Codd tends to be theoretical ideas

From my practice I can say that using natural key is a good choice only in reference-like lookup tables, provided the key field in the real world is unique and not null and doesn't change in time. I other cases using surrogate key is much more preferable (from my point of view): it's just more convenient way to design tables in spite of all the thing 3NF or Boyce-Codd tell us.

answered Nov 08 '10 at 08:00

andr

1,626
8
10

5

-1 Technically incorrect. Convenience for an app coder is never a justifiable reason to break Relational rules and standards. The resulting database has lost the power of "relational" and "database". If you are going to design databases that last, it might help to learn the theoretical and practice it until you appreciate the value of it. Dismissing something that you do not understand is, well, very limited mindset. – PerformanceDBA Nov 10 '10 at 13:16
Unless you have a *measured* bottleneck where you know denormalization is beneficial you should never ever design the system in a denormalized fashion. And even when the bottleneck is there, you should be extremely cautious. Denormalization also tends to put more pressure on the application programmer to maintain it. From the outset, you should go for 3NF/BC simply because it keeps your future as flexible as possible. – I GIVE CRAP ANSWERS Nov 10 '10 at 13:39
Well, I am familiar with rational algebra and didn't mean that denormalization is good and that developer should not tend to use theory designing tables structure. The thing I tried to say is that there's nothing bad adding new property (surrogate key) to the table, provided you can't extract property (-ies) from real life objects that can be used as PK (either composite or not). And in my practice I found that these properties can be easily extracted in "reference-like lookup tables". Please correct me if I'm wrong. – andr Nov 10 '10 at 14:57

Does adding "ID" field to database tables according to the third normal form considered a mistake?

5 Answers5