Uncertain on data duplication rules for normalisation (specifically 1NF and 2NF)

Question

Due to struggling to recall the specifics of a lesson I had a few weeks ago, I'm trying to use a TutorialPoint series to clarify the process of normalisation. In it's page for 2NF, it gives the example of:

CREATE TABLE CUSTOMERS(
       CUST_ID    INT              NOT NULL,
       CUST_NAME VARCHAR (20)      NOT NULL,
       ORDER_ID   INT              NOT NULL,
       ORDER_DETAIL VARCHAR (20)  NOT NULL,
       SALE_DATE  DATETIME,
       PRIMARY KEY (CUST_ID, ORDER_ID)
);

It states that this is in 1NF, however in it's page for 1NF it states that in order to be in 1NF "there are no repeating groups of data". In the 2NF example, there would be repeated groups of data. For example, if a customer orders two items, their name would be repeated. Doesn't this break 1NF and if not why?

score 0 · Accepted Answer · answered Oct 03 '15 at 16:44

"Repeating groups" in descriptions of 1NF refers to what spreadsheet users would call "vertically merged cells". Another way of viewing it is that rows may not contain nested tables. To convert repeating groups to 1NF, each row in the the table is given its own copy of the relevant values so that we have a set rather than a hierarchy of rows.

Multiple instances of the same values are perfectly acceptable in all the normal forms. It's a common misconception that normalization is meant to reduce duplication of values. Rather, it's meant to reduce duplication of associations between values.

score 0 · Answer 2 · edited May 23 '17 at 11:51

To answer your question, there won't be any repeating groups of data. The primary key is (CUST_ID, ORDER_ID) and for every new order, at least the ORDER_ID would be unique. Now, there would also exist a separate Order table. When a customer orders two items, we will insert one entry in this table and another entry in the Order table. Both entries would have same ORDER_ID. However only the Order table will contain the information that there are two items included in this order.

If we would insert the information for two items into this table itself, then we would have repeating groups with same (CUST_ID, ORDER_ID) and that would violate 2NF. But that is not how we are updating tables.

The rules of first three NFs are as follows -

1NF: simply states that all columns must have atomic values. If a column needs more than one value, create another table.

2NF: requires 1NF qualification and that any non-key field should be dependent on the entire primary key.

3NF: requires 2NF qualification and that no non-key field should depend upon any other non-key field. This means that there should be no dependencies between table's columns except on the primary keys.

I have written another such answer which is also about Normalization in Database. If you get the above written three rules, you can stop caring about anything else you have learned regarding those three rules anywhere else. The definitions are complete.

In your set of rules, you haven't stated that there is a need for a primary key to be in 1NF. I've seen a bit of discussion about this. Can you explain why you haven't specified that? — Scott Page, Oct 03 '15 at 21:35
@ScottPage: It is totally by mistake. Primary key is essential for differentiating between tuples. Otherwise whole purpose of redundancy removal will be lost. — displayName, Oct 03 '15 at 21:59

Uncertain on data duplication rules for normalisation (specifically 1NF and 2NF)

2 Answers2