Explain Like I am Five -> How a Primary Key Satisfies First Normal Form

Question

Thank you for your knowledge in advance. I am studying for the Microsoft Technology Exam and one of the practice questions is :

Creating a primary key satisfies the first normal form. True or False?

I personally think it is False because the first normal form is to get rid of duplicate groups. But there is a sentence in the text (Database Fundamentals, Exam 98-364 by Microsoft Press) that says the following:

"The first normalized form (1NF) means the data is in an entity format, which basically means that the following three conditions must be met: • The table must have no duplicate records. Once you have defined a primary key for the table, you have met the first normalized form criterion."

Please help me understand this, please explain like I am five. Thanks.

[1NF](http://en.wikipedia.org/wiki/First_normal_form) also requires that no repeating groups must be present which a primary key in itself does not guarantee. So I think the claim "* Once you have defined a primary key for the table, you have met the first normalized form criterion*" is wrong. — , Feb 13 '15 at 09:14
You didn't quote all three criteria. See my answer. There is a link to a pdf of the book [here](https://archive.org/details/pdfy-R5IDXHXRRZ-GTl2g). — philipxy, Feb 14 '15 at 20:04

score 7 · Answer 1 · answered Feb 13 '15 at 14:55

I can't explain this stuff to a five year old. I've tried. But I may be able to shed a little light on the subject. The first thing you need to know is that there have been multiple definitions of 1NF over the years,and these definitions sometimes conflict with each other. This may well be the source of your confusion, or at least some of it.

A useful thing to know is what purpose Ed Codd had in mind when he first defined it. Ed Codd defined First Normal Form, which he called Normal Form, back in the paper he published in 1970. His purpose in that paper was to demonstrate that a database built along relational lines would have all the expressive power that existing databases had. Existing databases often dealt with a parent that owns a set of children. For example, if the parent data item contains data about a student, each child might contain data about one course the student is taking.

You can actually define such a structure in terms of mathematical relations by allowing one of the attributes of a relation to be itself a relation. I'm going to call that "nesting" relations, although I don't recall what Ed Codd called it. In defining the relational data model, which is closely patterned after mathematical relations, Ed Codd wanted, for a variety of reasons, to forbid such a structure. his reasons were mostly practical, to make it more feasible to build the first relational database.

So he devoted some of his paper to proving that you could limit attributes to "simple" values without reducing the expressive power of the relational data model. I'm going to sidestep what "simple" means for the moment, although it's worth coming back to. He called this limitation "normal form". Once a second normal form was discovered, normal form got renamed to first normal form.

When it came time to build a relational database the engineers decided on a data structure called a "table". (I don't know the actual history, but this is approximate). A table is a logical structure made up of rows and columns. It can be thought of as an array of records, where each record represents a row, and all the records have the same header.

Now, if you want such a structure to represent a relation, you have to throw in a restriction that will prevent two rows with exactly the same values. If you had such duplicates, this would not represent a relation. A relation, by definition, has distinct elements. This is where primary keys come in. A table with a primary key can't have duplicate rows, because it can't have duplicate keys.

But I'm not done yet. You didn't ask this, but it has come up a thousand times in stack overflow, so it's worth putting in here. A designer can defeat Ed Codd's original intent by creating a column that contains text that, in turn contains comma separated values. In Codd's original formulation, a list of values is not "simple".

This is enormously appealing to the neophyte because it looks simpler and more efficient, to store a table with comma separated values than to create two tables one for parent records and the other for child records, and to join them when they are both needed for one query. Joins are not simple to the neophyte, and they do take some computer resources.

The CSV in a column design turns out to be an unfortunate design in nearly every case. The reason is that certain queries that could have been done real fast via an index now require a full table scan. This can turn seconds into minutes or minutes into hours. It's much more expensive than a join.

So you have to teach the newbies why keyed access to all data is a good thing, and this means you have to teach them what 1NF is really all about. And this can be as hard as teaching a five year old. Newbies are typically less ignorant than five year olds, but they tend to be more stubborn.

score 6 · Answer 2 · answered Feb 13 '15 at 12:00

First Normal Form is mostly a matter of definition rather than design. In a relational system, the data structures are relation variables. Since a relation always consists of unique tuples a relation variable will always have at least one candidate key. By convention we call one key per relation a "primary" key so in a relational database the primary key requirement is always satisfied.

Similarly, in a relational database all attributes contain values which are identifiable by name, not by positional index and so the issue of "repeating groups" does not apply. The concept of a "repeating group" exists in some non-relational systems and that was what Codd was referring to when he originally defined 1NF.

However, problems of the interpretation of 1NF arise because most modern DBMSs are not truly relational even though people try to use them like relational systems. Since SQL DBMSs are not relational, how are we to interpret relational concepts like 1NF in a SQL DBMS?

The essense of 1NF is that each table must have a key and that tuples consist of single values for each attribute. Most SQL-based systems don't support the concept of "repeating groups" (multiple values in a single attribute position) so it is usually safe to say that if a SQL table has a key and does not permit nulls in any attribute position then it is "relational" and satisfies the spirit of 1NF.

What you say is politically(1) correct, but are you confident it will also get him the points on his exam ? (1) according to the politics of the real relational model. — Erwin Smout, Feb 13 '15 at 15:05
@Erwin A good question. My answer may not be the one the examiner is looking for. I've never taken any of these vendor certification exams and possibly I wouldn't score very well at this one. — nvogel, Feb 14 '15 at 13:52

score 2 · Accepted Answer · answered Feb 13 '15 at 09:07

2

A primary key must be completely unique. So once this is part of a record, it is distinct from any other record.

eg.

Record 1
---------
KEY = 1
Name = Fred Boggs
Age = 84


Record 2
--------
KEY = 2
Name = Fred Boggs
Age = 84

These 2 records are different because the field KEY is different. Therefore although the rest of the data is the same, it meets the requirements for 1NF.

answered Feb 13 '15 at 09:07

Peterp

100
13

1

That's not true, 1NF also requires to remove repeating groups. So a table e.g. `create table person (id integer primary key, hobby_1 varchar, hobby_2 varchar)` has a primary key, but is not 1NF – Feb 13 '15 at 09:12
3

@a_horse_with_no_name in 1NF every column should contain atomic values, but I never saw the requirement to remove `hobby_2` or second column for a phone number – mucio Feb 13 '15 at 09:18
2

Well then just picture a column `hobby` with the contents `'hiking,mountainbike'`. It's not atomic and thus it's not 1NF even when a PK is present. "Numbered" columns are just a different way of doing the same thing. – Feb 13 '15 at 09:21
3

Putting a comma in a string doesn't make the string any less of an atomic value. 1NF has nothing to with the content of attribute values (much less the names of attributes); it is purely a matter of the table structure. – nvogel Feb 13 '15 at 12:03
2

@sqlvogel: for all practical purposes storing delimited values is considered as "multiple values in a single column". _Technically_ that might be just a single string, but logically those are multiple values - and that *does* violate normal form. It's the same thing as with "atomic values". The value `'Ford Prefect'` isn't logically an atomic value because it consists of a firstname and a lastname but for the database it's just a single character literal. – Feb 13 '15 at 12:14
3

Sure, it generally isn't a great idea to do that if it means you have to do lots of string manipulation to extract values from a string. That isn't what 1NF is about though. If I store a string in a table and then tomorrow I decide to use the SUBSTRING function on it that doesn't mean the table suddenly is no longer 1NF. – nvogel Feb 13 '15 at 12:25
I agree with the horse. A string with multiple values separated by commas is not "atomic" at the conceptual level. It pulls an end run around the limitation Codd intended in the 1970 paper. And it prevents keyed access to all data. Consider finding all people with the "hiking" hobby. It's full table scan time, folks. – Walter Mitty Feb 14 '15 at 12:08
Note that uniqueness of rows does not guarantee entity integrity at the conceptual level. The given example could be that there is only one person named Fred Boggs, but he got entered into the database twice by accident. And the autonumber feature dutifully assigned unique keys to the two records. – Walter Mitty Feb 14 '15 at 12:10

philipxy · Answer 4 · 2017-12-15T23:43:33.473

You are only quoting a fragment of the text Database Administration Fundamentals. A more complete quote is:

The first normalized form (INF) means the data is in an entity format, which basically means that the following three conditions must be met:
• The table must have no duplicate records. [...]
• The table also must not have multivalued attributes, meaning that you can't combine in a single column multiple values that are considered valid for a column. [...]
• The entries in the column or attribute must be of the same data type.

(The history of the term "1NF" is full of confusions, vagueness and changes. But here's what this text says.)

I like this answer the best because it comes from the textbook Microsoft has endorsed for this particular test that the OP is working to. — Sonny Childs, May 18 '15 at 18:09

Branko Dimitrijevic · Answer 5 · 2015-02-15T22:32:29.110

Let me join the party ;)

For a question "is this relation in 1NF" to have a meaning, you first need a relation. And for your table to be a relation, you need a key. A table without any keys is not a relation.

Why? Because relation is a set (of tuples/rows) and a set cannot contain same element more than once (otherwise it would be multiset), which is ensured by a key.

Once you have a relation by having a key, you can see if all your attributes are atomic, and if they are, you have yourself a 1NF.

So the answer to...

Creating a primary key satisfies the first normal form. True or False?

...is False. You do need a key, but you also need atomicity.

Explain Like I am Five -> How a Primary Key Satisfies First Normal Form

5 Answers5

Linked