Normal forms: 4 vs 5

Question

I am struggling to understand 4NF, 5NF, and their difference.

Here is the way I would describe 4/5NF (or, how I would describe the steps to achieve it) to someone who doesn't know. I am stating this because this will show what I have really understood.

Typically, a N:N entity relationship should be implemented by having a join table for their possible combinations. If there are 3 or more entities connected with N:N relationships, one should consider carefully:

The more general(including) solution would be to implement a join table containing all entities as fields, and all the combinations of them as values(rows)
However, if the relationship of these entities is not realy a per-full-tuple case, but rather the (cartesian) product of (some of) their dyadic N:N relationships, then carefully think the minimum amount of two-field tables needed.
Generalizing 2, always prefer (if it is correct, of course) to have join tables with as few fields as possible. And oviously, do not create a join table if there is no use for it.
A helpful tip to distinguish the above is, when an insert is done, if your heart(!) tells you you are doing redundunt, or invalid, things, then you should choose one of the later methods.

E1) Example of Wikipedia's page on 4NF: https://en.wikipedia.org/wiki/Fourth_normal_form

We have entities Restaurant, Pizza Variety, and Delivery Area. We could implement their many-to-many relationships with one join table including all three. However, if one thinks of the data correctly, these triplets are a product of only 2 N:N relationships: Restaurant:Pizza and Restaurant:Delivery Area. If the "A1 Pizza" Restaurant decided to include "Thin Crust" Pizza Variety to its repertoire, then I'd have to either insert one row with the same restaurant/pizza variety to all delivery areas of "A1 Pizza" which would feel 4.redundant, or only insert for a specific delivery area, which would feel 4.invalid, because no shop would offer less variety to a delivery area (or at least, let's say our specification says so).

E2) Example of Wikipedia's page on 5NF: https://en.wikipedia.org/wiki/Fifth_normal_form

We have entities Salesman, Brand, and Type. We could implement their many-to-many relationships with one join table including all three. However, because of the "the following rule applies" part, the triplets are actually a (cartesian) product of the 3 N:N relationships available, and as such, the correct method is to have 3 join tables for it. The "Note how this setup helps to remove redundancy." part is much like my 4.th point.

That case is made even more confusing by the fact that while the article states "Also note that the table is in 4NF", the truth is that if the table had all the rows it should so as to cover the "following rule", then it would not cover 4NF! Right?

So.. What is the difference between E1 and E2 which makes one of them a 4NF and the other a 5NF example?

Read my answer to https://stackoverflow.com/questions/7083699/how-to-model-a-database-with-many-mn-relations-on-a-table/ for examples of how violating 4NF leads to problems. — Bill Karwin, May 23 '18 at 14:39
@philipxy First, this is no homework. The formulation part is just me trying to ask for an example-based answer. Second, I think that the number of hours I have used in trying to understand the difference does warrant me a right to ask something. Every question can be answered by reading enough books, why have SO around then? About the multiple questions, Q1 and Q3 are actually more of a statement that I know I might not have understood but am just showing my effort. I'll make an edit then, so thanks for that part. — George Menoutis, May 24 '18 at 06:52
The homeworks links are nevertheless appropriate. Not all questions are on-topic for SO. Few things are explained by examples; an example illustrates an explanation. You cannot understand the NFs without FDs, CKs, projection & join, and algorithms to put into the NFs use those. Unfortunately as I said you are trying to understand via irrelevant concepts, that are not used in the textboooks you should be reading. Eg the wiki articles though poor are nevertheless correct. Reference some academic presentation of a NF definition or algorithm & ask re where you are stuck. — philipxy, May 24 '18 at 16:16
Please edit to be more clear. Eg: What does "feel" mean? Or "cover" a rule or "not cover" a NF? What is "confusing" about the examples? Also: Those aren't "products"; although they're (natural) joins. You mean "triplets are" *elements of*. Should "no shop" be "some shop"? You confuse relation(ship) cardinality with arity. You confuse the notion of n-ary relation(ship) with the notion of multiple binary relation(ship)s. [PS](https://stackoverflow.com/questions/49099262/unable-to-understand-wikipedia-examples-of-a-not-4nf-table-a-4nf-table#comment85210291_49099262) — philipxy, May 24 '18 at 20:57

philipxy · Accepted Answer · 2018-06-02T22:58:36.397

What is the difference between E1 and E2 which makes one of them a 4NF and the other a 5NF example?

Non-4NF & non-5NF relations both exhibit update anomalies due to JDs; 4NF means no anomalies from binary JDs & 5NF means no anomalies from JDs of any arity. The Wikipedia example normalization to 4NF got rid of a binary JD--the relation was a problematic 2-way join. The normalization to 5NF got rid of a 3-way JD--the relation was a problematic 3-way join. (Since it started in 4NF, it couldn't have had any problematic binary JDs.)

A relation (value or variable) is in 5NF when for every way it can be losslessly decomposed (ie into projections that join back to it) (ie a corresponding JD (join dependency) holds) the components can be joined back in some order where the common columns of each join are a superkey of the original. (Fagin's PJ/NF paper's membership algorithm.) The definition of 4NF is the same except that only the ways that it can be losslessly decomposed into two projections matters (ie the corresponding JD is binary) (ie a corresponding MVD (multi-valued dependency) holds).

(Such a permissible JD having such a sequence of joins is said to be "implied by the CKs (candidate keys)".)

The idea is that if we can decompose into projections that join back to an original then we should, except that a join on a superkey does not cause any problems/anomalies.

When a FD (functional dependency) S -> A holds in a relation with attribute set R, the relation is losslessly decomposable on S U {A} & R - {A}. So JD {S U {A}, R - {A}} holds & MVD S ->> {A} holds.

From Which highest normal form is this table in?

Relation Meanings/Predicates

On the other hand, suppose you knew the relation's meaning to the extent that you knew that it holds tuples that make a true statement from a (characteristic) predicate expressible as the conjunction of others, say
    ticket Ticket was submitted by a person with first name Vname
AND there is a person with name Vname Nname
AND ticket Ticket was submitted by a person with last name Nname
Join is designed so that the predicate of its output is the AND of the predicates of its inputs. So you would know to check for whether any corresponding decompositions of the original satisfy the JD (ie whether the relations from the conjuncts are projections of the original) and so to check whether the JD is implied by the original's CKs.

The point of normalization to higher NFs is that a JD holds when a relation's predicate can be expressed as the conjunction of others and their relations are projections of the original, so we can use the simpler separate relations instead, except we might as well JOIN/AND the relations/predicates on pairwise shared CKs because there are still no update anomalies. (If FD {x, ...} -> a holds then a certain MVD holds & a certain binary JD holds and the predicate of the relation can be expressed as ... AND a = f(x, ...).)

I am still trying to understand, but at least there IS something to. Your other link is very helpful as well! When I grasp the "join dependency" concept, I think it'll all be made clear. Thanks. — George Menoutis, May 25 '18 at 07:32

score 2 · Answer 2 · answered May 24 '18 at 19:30

The difference is not very important because 4NF itself is not important unless you are interested in the history of database design theory.

5NF requires that every join dependency (JD) satisfied by a table must be implied by the superkeys of that table. 4NF is concerned only with the concept of multi-valued dependency (MVD) but since a MVD always implies the existence of a corresponding JD there is no need to be concerned with 4NF at all. The historical reason for the existence of 4NF is that it happened to be invented first and was then effectively superseded by 5NF - just as 3NF was superseded by EKNF/BCNF.

Damir Sudarevic · Answer 3 · 2020-04-06T16:16:01.170

Both Wikipedia examples:

Start with a table in BCNF (whole heading is the key).
The table has redundancy.
Introduce a rule, which actually spells-out that it is possible to decompose the table.
Decomposing the table removes redundancy.
Once decomposed, the projections (new tables) are all in 6NF (and therefore in 5, 4, .., 1)

That's why it is hard to see the difference. The only real difference is that in the first case (E1) the table is decomposed into two projections, and in the second case (E2) into three.

This actually follows DB-history. The following may not exactly describe historical events, but is close enough:
Once upon a time, a bunch of DB-mathematicians get together for a drink (conference) and one of the guys says: "Hey, look at this: I have a table in BCNF, it has redundancy, it can be decomposed into two projections and the redundancy is gone". And they: think, think, drink, math, more math, and eventually someone defines something called multivalued dependency and the 4NF. They have few more drinks, high-five, and go home.
Time passes, snow falls and melts. They get together again, and guess what? "Hey guys, a table in BCNF, but this time it can be decomposed into three projections." Ditto redundancy. And they think and drink again. That last 4NF definition does not cut it. Math, math, think, drink, think. Eureka, so someone comes up with the concept of join dependency and the 5NF is born. The party continues late into the night. The rest is history, they've been confusing the s@!t out of everybody ever since.

What is a join dependency? Informally: if you decompose a relation R into N projections (X1 ... XN) and then join these projections back, you must get the same relation. There should be no extra, nor missing tuples (same for columns). And btw, what is a multivalued dependency? Well, it is a special case of a join dependency -- a join dependency with exactly two components (N=2).

To understand how a rule spells-out a join dependency take a look at this SO Q/A.

Normal forms: 4 vs 5

3 Answers3

Linked