I am struggling to understand 4NF, 5NF, and their difference.
Here is the way I would describe 4/5NF (or, how I would describe the steps to achieve it) to someone who doesn't know. I am stating this because this will show what I have really understood.
Typically, a N:N entity relationship should be implemented by having a join table for their possible combinations. If there are 3 or more entities connected with N:N relationships, one should consider carefully:
- The more general(including) solution would be to implement a join table containing all entities as fields, and all the combinations of them as values(rows)
- However, if the relationship of these entities is not realy a per-full-tuple case, but rather the (cartesian) product of (some of) their dyadic N:N relationships, then carefully think the minimum amount of two-field tables needed.
- Generalizing 2, always prefer (if it is correct, of course) to have join tables with as few fields as possible. And oviously, do not create a join table if there is no use for it.
- A helpful tip to distinguish the above is, when an insert is done, if your heart(!) tells you you are doing redundunt, or invalid, things, then you should choose one of the later methods.
E1) Example of Wikipedia's page on 4NF: https://en.wikipedia.org/wiki/Fourth_normal_form
We have entities Restaurant, Pizza Variety, and Delivery Area. We could implement their many-to-many relationships with one join table including all three. However, if one thinks of the data correctly, these triplets are a product of only 2 N:N relationships: Restaurant:Pizza and Restaurant:Delivery Area. If the "A1 Pizza" Restaurant decided to include "Thin Crust" Pizza Variety to its repertoire, then I'd have to either insert one row with the same restaurant/pizza variety to all delivery areas of "A1 Pizza" which would feel 4.redundant, or only insert for a specific delivery area, which would feel 4.invalid, because no shop would offer less variety to a delivery area (or at least, let's say our specification says so).
E2) Example of Wikipedia's page on 5NF: https://en.wikipedia.org/wiki/Fifth_normal_form
We have entities Salesman, Brand, and Type. We could implement their many-to-many relationships with one join table including all three. However, because of the "the following rule applies" part, the triplets are actually a (cartesian) product of the 3 N:N relationships available, and as such, the correct method is to have 3 join tables for it. The "Note how this setup helps to remove redundancy." part is much like my 4.th point.
That case is made even more confusing by the fact that while the article states "Also note that the table is in 4NF", the truth is that if the table had all the rows it should so as to cover the "following rule", then it would not cover 4NF! Right?
So.. What is the difference between E1 and E2 which makes one of them a 4NF and the other a 5NF example?