292

In relational database design, there is a concept of database normalization or simply normalization, which is a process of organizing columns (attributes) and tables (relations) to reduce data redundancy and improve data integrity. Wikipedia

What is an explanation based on examples about what 1NF, 2NF, 3NF, and 3.5NF (Boyce-Codd BCNF) mean?

philipxy
  • 14,867
  • 6
  • 39
  • 83
barfoon
  • 27,481
  • 26
  • 92
  • 138

4 Answers4

454

1NF is the most basic of normal forms - each cell in a table must contain only one piece of information, and there can be no duplicate rows.

2NF and 3NF are all about being dependent on the primary key. Recall that a primary key can be made up of multiple columns. As Chris said in his response:

The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF] (so help me Codd).

2NF

Say you have a table containing courses that are taken in a certain semester, and you have the following data:

|-----Primary Key----|               uh oh |
                                           V
CourseID | SemesterID | #Places  | Course Name  |
------------------------------------------------|
IT101    |   2009-1   | 100      | Programming  |
IT101    |   2009-2   | 100      | Programming  |
IT102    |   2009-1   | 200      | Databases    |
IT102    |   2010-1   | 150      | Databases    |
IT103    |   2009-2   | 120      | Web Design   |

This is not in 2NF, because the fourth column does not rely upon the entire key - but only a part of it. The course name is dependent on the Course's ID, but has nothing to do with which semester it's taken in. Thus, as you can see, we have duplicate information - several rows telling us that IT101 is programming, and IT102 is Databases. So we fix that by moving the course name into another table, where CourseID is the ENTIRE key.

Primary Key |

CourseID    |  Course Name |
---------------------------|
IT101       | Programming  |
IT102       | Databases    |
IT103       | Web Design   |

No redundancy!

3NF

Okay, so let's say we also add the name of the teacher of the course, and some details about them, into the RDBMS:

|-----Primary Key----|                           uh oh |
                                                       V
Course  |  Semester  |  #Places   |  TeacherID  | TeacherName  |
---------------------------------------------------------------|
IT101   |   2009-1   |  100       |  332        |  Mr Jones    |
IT101   |   2009-2   |  100       |  332        |  Mr Jones    |
IT102   |   2009-1   |  200       |  495        |  Mr Bentley  |
IT102   |   2010-1   |  150       |  332        |  Mr Jones    |
IT103   |   2009-2   |  120       |  242        |  Mrs Smith   |

Now hopefully it should be obvious that TeacherName is dependent on TeacherID which is not part of the primary key - so this is not in 3NF. To fix this, we do much the same as we did in 2NF - take the TeacherName field out of this table, and put it in its own, which has TeacherID as the key.

 Primary Key |

 TeacherID   | TeacherName  |
 ---------------------------|
 332         |  Mr Jones    |
 495         |  Mr Bentley  |
 242         |  Mrs Smith   |

No redundancy!!

One important thing to remember is that if something is not in 1NF, it is not in 2NF or 3NF either. So each additional Normal Form requires everything that the lower normal forms had, plus some extra conditions, which must all be fulfilled.

Nate Anderson
  • 18,334
  • 18
  • 100
  • 135
Smashery
  • 57,848
  • 30
  • 97
  • 128
  • 32
    @instantsetsuna - Full explanation: In some courts, a witness is asked if they will tell "The truth, the whole truth, and nothing but the truth, so help me God"; as God is considered to be the one with authority when it comes to knowing if you're telling the truth. In the case of databases, we can say "The data depends on the key, the whole key and nothing but the key, so help me Codd". Ted Codd is the one who came up with the idea of relational databases - things relying on keys, etc., so he would be the authority to go by in the case of relational databases. – Smashery Aug 12 '10 at 02:48
130

I've never had a good memory for exact wording, but in my database class I think the professor always said something like:

The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF].

Chris Shaffer
  • 32,199
  • 5
  • 49
  • 61
  • 7
    So what is difference between `The data depends on the key [1NF], nothing but the key [3NF]` ? Please dont confuse us, as 1 sentense answer doesnt clarify the answer but confuse visitors! – Pratik Joshi Oct 04 '15 at 09:34
  • 2
    *"each cell in a table must contain only one piece of information, and there can be no duplicate rows."* - I don't see how "The data depends on the key" matches all of that. – Simon Forsberg Jul 14 '17 at 13:15
  • @PratikJoshi Your objection would've been more legitimate had the answer started with "The data depends ***only*** on the key [1NF]" – Mehdi Charife May 01 '23 at 13:55
51

Here's a quick, admittedly butchered response, but in a sentence:

1NF : Your table is organized as an unordered set of data, and there are no repeating columns.

2NF: You don't repeat data in one column of your table because of another column.

3NF: Every column in your table relates only to your table's key -- you wouldn't have a column in a table that describes another column in your table which isn't the key.

For more detail, see wikipedia...

Dave Markle
  • 95,573
  • 20
  • 147
  • 170
  • 3
    As far as I can tell, 1NF's eschewing of _repeating **groups**_ does not refer to repeating columns, but rather single _columns_ that represent an arbitrary number of repeated values for the same attribute, i.e. are not atomic. I base this on e.g. (1) http://stackoverflow.com/questions/23194292/normalization-what-does-repeating-groups-mean / (2) http://stackoverflow.com/questions/26357276/1nf-repeating-groups-what-are-they – underscore_d Mar 30 '17 at 15:54
39

1NF: Only one value per column

2NF: All the non primary key columns in the table should depend on the entire primary key.

3NF: All the non primary key columns in the table should depend DIRECTLY on the entire primary key.

I have written an article in more detail over here

Arcturus
  • 2,902
  • 2
  • 19
  • 11
  • To be more precise, that 3NF rule is actually BCNF. 3NF carries an additional constraint, where if there is a functional dependency X --> A, then X is not a proper subset of any key, but A is part of some key. (Source: Database Management Systems by Raghu Ramakrishnan & Gehrke). – Dheeru Mundluru Sep 13 '17 at 06:58