-1

I'm in the process of designing my first database and I'm trying to follow the normalization rules up to 3NF. I am using the following definitions for normalization:

  • 1NF: No repeating groups and a primary key is identified.
  • 2NF: Partial key dependencies are removed. All non-key columns are fully dependent on the primary key.
  • 3NF: Transitive dependencies and non-key dependencies are removed.

The table is for storing information about pet food. Here is a basic example of the table before any normalization:

╔════════════╦═════════╦═════════════╦════════════╦═══════════╦═══════════════╦═══════╗
║ Brand      ║ Flavor  ║ Animal Type ║ Breed Size ║ Age Group ║ Ingredients   ║ Price ║
╠════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════════════╬═══════╣
║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken, Salt ║ 18.99 ║
╠════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════════════╬═══════╣
║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken, Salt ║ 18.99 ║
╚════════════╩═════════╩═════════════╩════════════╩═══════════╩═══════════════╩═══════╝

Following the 1NF definition above, I added a primary key column. In this case, the only issue was in the Ingredients column containing multiple entries per record:

╔═════════╦════════════╦═════════╦═════════════╦════════════╦═══════════╦═════════════╦═══════╗
║ Food ID ║ Brand      ║ Flavor  ║ Animal Type ║ Breed Size ║ Age Group ║ Ingredients ║ Price ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 1       ║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken     ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 1       ║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Salt        ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 2       ║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Chicken     ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═════════════╬═══════╣
║ 2       ║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ Salt        ║ 18.99 ║
╚═════════╩════════════╩═════════╩═════════════╩════════════╩═══════════╩═════════════╩═══════╝

There is a repeating group now because of the Ingredients column. To fix this, I created two separate tables; An ingredients table, and an associative table that connects the ingredients table and food table together:

╔═══════════════╦════════════╗
║ Ingredient ID ║ Ingredient ║
╠═══════════════╬════════════╣
║ 1             ║ Chicken    ║
╠═══════════════╬════════════╣
║ 2             ║ Salt       ║
╚═══════════════╩════════════╝

╔═════════╦═══════════════╗
║ Food ID ║ Ingredient ID ║
╠═════════╬═══════════════╣
║ 1       ║ 1             ║
╠═════════╬═══════════════╣
║ 1       ║ 2             ║
╠═════════╬═══════════════╣
║ 2       ║ 1             ║
╠═════════╬═══════════════╣
║ 2       ║ 2             ║
╚═════════╩═══════════════╝

Now, I can remove the Ingredients column from the food table:

╔═════════╦════════════╦═════════╦═════════════╦════════════╦═══════════╦═══════╗
║ Food ID ║ Brand      ║ Flavor  ║ Animal Type ║ Breed Size ║ Age Group ║ Price ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════╣
║ 1       ║ Purina     ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ 18.99 ║
╠═════════╬════════════╬═════════╬═════════════╬════════════╬═══════════╬═══════╣
║ 2       ║ BlueWilder ║ Chicken ║ Dog         ║ Small      ║ Adult     ║ 18.99 ║
╚═════════╩════════════╩═════════╩═════════════╩════════════╩═══════════╩═══════╝

At this point, I am unsure of how to proceed.

My line of thinking is this: If any of the non-primary key columns change its value, the Food ID must be different. For example, the brand, flavor, breed size, age group, ingredients, and price can all be the same. But if the animal type is a dog, and then changes to a cat, it must be a different record. This applies to all the non-primary key columns. Following this thinking, all the non-primary key columns ARE fully dependent on the primary key, and there are no transitive dependencies.

Is my line of thinking correct? Is 2NF and 3NF, by the definitions I listed above, already satisfied?

pfinferno
  • 1,779
  • 3
  • 34
  • 62
  • 1
    ["1NF" has no single meaning.](https://stackoverflow.com/a/40640962/3404097) Nor does "unnormalized" or "UNF" or "0NF" or for that matter "relation". So you need to tell us your definitions & preferably also textbook name & edition. PS "columns have repeated values now because of [..]. So" That is not a reason to do anything in particular. "none of the non-primary key columns are dependent on other non-primary key columns" That is not a correct description of a NF. Normalization does not involve adding ids. Normalizing to higher NFs is not per se a matter of 1st normalizing to lower NFs. – philipxy Nov 25 '20 at 19:38
  • 1
    You don't clearly say what you are doing with each step clearly justifed by reference to quoted given definitions, theorems, algorithms & heuristics. So we can't point out anything wrong, we can only say that you aren't clearly saying what you are doing or why & rewrite your textbook & give a bespoke tutorial doing your (home)work. Show the steps of your work following your reference/textbook, with clear referenced justification. If you're not sure it's right, ask 1 specific researched non-duplicate question re where you got stuck. All steps are SO faqs. PS Please make examples minimal. – philipxy Nov 25 '20 at 19:57
  • I really don't see how my question isn't clear. Every source I see has the same basic steps: eliminate repeating groups for 1NF, remove dependencies on parts of a key for 2NF, and remove transitive dependencies for 3NF. I'm not looking at a textbook, just simple google searches. In my mind, I don't have transitive dependencies, so I'm asking here for people that have worked more with databases if it looks correct. I stated I was new to this, so if there are multiple meanings for normalization it should be obvious that I wouldn't know them all. – pfinferno Nov 25 '20 at 20:19
  • 1
    My 1st comment says "1NF" is not clear then points out a number of things that are wrong or don't make sense in your post. You don't say what the justifcation is for transformations you make. Things that are relevant are missing. You really need to follow a (good) published textbook. Dozens are free online. Or even SO posts, though answers are generally very poor. Anyway, nothing is stopping you from giving the definitions etc & reasoning that you are using in this post, so we can address them and/or you use of them. I told you why. Please clarify via edits, not comments. Good luck. – philipxy Nov 25 '20 at 20:46
  • 1
    @pfinferno, regarding "steps 1NF, 2NF..." take a look at this Q/A https://dba.stackexchange.com/questions/65211/rules-for-decomposition-to-2-nf/65247#65247 – Damir Sudarevic Nov 26 '20 at 13:59
  • Those are not correct definitions of NFs or correct processes for putting relations into NFs, and they're not clear either. PS "repeating groups" doesn't mean anything in particular. (Already explained in my 1NF link.) – philipxy Nov 26 '20 at 19:45
  • Can you please tell me what exactly you think is the correct definition and process then? I have 23 tabs open all saying the same exact thing for normalization. I can link them all if you'd like. Here's two referenced from stackoverflow: http://graphdatamodeling.com/resources/rettigNormalizationPoster.pdf and https://opentextbc.ca/dbdesign01/chapter/chapter-12-normalization – pfinferno Nov 26 '20 at 20:03
  • 1
    @pfinferno https://stackoverflow.com/questions/23194292/normalization-what-does-repeating-groups-mean – Damir Sudarevic Nov 26 '20 at 20:40
  • 1
    I said to do some things, but you haven't done them. Right now I don't want to repeat my comments & my posts. – philipxy Nov 26 '20 at 22:41
  • Thank you for that first link! It got me thinking about this a little differently. I also took a step back and looked at my data. I kept trying to push it into a mold that didn't really fit my way of thinking about it. I decomposed all my tables to a point where there weren't any redundancies and realized I was already at 3NF. I thought there HAD to be a step from 2NF to 3NF that I was missing. – pfinferno Nov 28 '20 at 20:26

1 Answers1

1

To save on screen space I'll rename attributes:

  • Brand: BRD
  • Flavor: FLA
  • Animal: ANM
  • Breed size: SIZ
  • Age group: AGP
  • Ingredient: ING
  • Price: PRI

Method 1

Normalization, as in textbooks.

Step 1.1

Due to repeating groups in ingredients, the value in the table is not a relation, hence the table does not represent a relational variable, hence it is not in 1NF. The solution is to make sure that the ingredient column (ING) has exactly one ingredient per row-- as you have done. However, no new attributes are added (no new ID). Now we have (in 1NF):

R {BRD, FLA, ANM, SIZ, AGP, ING, PRI}

The whole heading is the key.

Step 1.2

From FDs:

  • {BRD, FLA} -> {ANM}
  • {BRD, FLA} -> {SIZ}
  • {BRD, FLA} -> {AGP}
  • {BRD, FLA} -> {PRI}

by applying union rule for FDs and Heath's theorem:

 R1 {BRD, FLA, ANM, SIZ, AGP, PRI}
KEY {BRD, FLA}

 R2 {BRD, FLA, ING}
KEY {BRD, FLA, ING}

Done. That's it, if I got the FDs correctly.
Both are in BCNF, with confidence I would say that R1 is in 5NF, R2 in 6NF.

Method 2

Database design is predicate design.

Not a formal normalization method as found in most textbooks, but a design method which results in tables being in high NF (5NF, 6NF).

In the first step, the problem (business domain) is verbalized using simple predicates and associated constraints. The problem can be then reasoned about in natural language using logic.

A simple predicate can not be decomposed without losing information, its matching relvar is in 6NF.

In the second step, these simple predicates (and matching relvars) can be combined, making sure not to introduce redundancy and possibility of logical errors, namely contradiction. Once done, relvars (tables) are expected to be in 5NF.

Step 2.1

Describing the problem using simple predicates and matching 6NF relvars. Not verbalizing constraints (it gets long-winded), just stating them.

-- Brand BRD exists.
--
brand {BRD}
   PK {BRD}
-- Flavor FLA exists.
--
flavor {FLA}
    PK {FLA}
-- Animal type ANM exists.
--
animal {ANM}
    PK {ANM}
-- Breed size SIZ exists.
--
bsize {SIZ}
   PK {SIZ}
-- Age group AGP exists.
--
age {AGP}
 PK {AGP}
-- Ingredient ING exists.
--
ingredient {ING}
        PK {ING}
-- Pet food with flavor FLA made by brand BRD
-- is for animal type ANM.
--
food {BRD, FLA, ANM}
  PK {BRD, FLA}

 FK1 {BRD} REFERENCES brand  {BRD}
 FK2 {FLA} REFERENCES flavor {FLA}
 FK3 {ANM} REFERENCES animal {ANM}
-- Pet food with flavor FLA made by brand BRD
-- is recommended for breed size SIZ.
--
food_bsize {BRD, FLA, SIZ}
        PK {BRD, FLA}

       FK1 {BRD, FLA} REFERENCES
      food {BRD, FLA}

       FK2 {SIZ} REFERENCES bsize {SIZ}
-- Pet food with flavor FLA made by brand BRD
-- is recommended for breed age group AGP.
--
food_age {BRD, FLA, AGP}
      PK {BRD, FLA}

       FK1 {BRD, FLA} REFERENCES
      food {BRD, FLA}

       FK2 {AGP} REFERENCES age {AGP}
-- Pet food with flavor FLA made by brand BRD
-- is priced at PRI Euros per unit.
--
price {BRD, FLA, PRI}
   PK {BRD, FLA}

   FK {BRD, FLA} REFERENCES
 food {BRD, FLA}
-- Pet food with flavor FLA made by brand BRD
-- contains ingredient ING.
--
recipe {BRD, FLA, ING}
    PK {BRD, FLA, ING}

   FK1 {BRD, FLA} REFERENCES
  food {BRD, FLA}

   FK2 {ING} REFERENCES ingredient {ING}

Step 2.2

Just by looking at the keys, we can see that food, food_bsize, food_age, and price tables can be combined.

-- Pet food with flavor FLA made by brand BRD
-- is for animal type ANM, recommended for
-- breed size SIZ, breed age group AGP; priced
-- at PRI Euros per unit.
--
food_ {BRD, FLA, ANM, SIZ, AGP, PRI}
   PK {BRD, FLA}

 FK1 {BRD} REFERENCES brand  {BRD}
 FK2 {FLA} REFERENCES flavor {FLA}
 FK3 {ANM} REFERENCES animal {ANM}
 FK4 {SIZ} REFERENCES bsize  {SIZ}
 FK5 {AGP} REFERENCES age    {AGP}

If we decide not to keep the first six tables defining domains, then the final result is as in the first method:

food_ {BRD, FLA, ANM, SIZ, AGP, PRI}
   PK {BRD, FLA}


recipe {BRD, FLA, ING}
    PK {BRD, FLA, ING}

    FK {BRD, FLA} REFERENCES food_
       {BRD, FLA}

However, in a real-world project, you will likely need brand, flavor, animal, bsize, and age tables to constrain domains. There is no rule in normalization stating that you should have them.
Also, not all attributes may be known at the sime time, so it is not likely that you'll combine all of food, food_bsize, food_age, and price tables into food_. This will depend on the business process and optionality of attributes.

Adding IDs

Adding surrogate keys (IDs) has nothing to do with normalization. You may need to add them for other reasons, take a look at this example.


Note:

All attributes (columns) NOT NULL

KEY = PK or AK

PK = Primary Key
AK = Alternate Key   (Unique)
SK = Proper Superkey (Unique)
FK = Foreign Key

I always use the second method for design. Have never seen the first one used outside of a formal DB course of some kind; and people asking for help on SO. For some reason, beyond my understanding, method 1 is taught in DB courses under topic "DB design" before method 2. Most schools do not teach method 2 at all. Go figure.

Damir Sudarevic
  • 21,891
  • 3
  • 47
  • 71