How do I normalize data in a database?

Question

I'm in an intro to database management course and we're learning about normalizing data (1NF, 2NF, 3NF, etc.) and I'm super confused on how to actually go about and do it. I've read up on this, consulted various sites and youtube videos and I still can't seem to get it to click. I am using Microsoft Access 2013 if that's of any help.

This is the data I'm working with.

Thanks.

Edit1: Alright, I think I have the tables set up correctly. But now I'm having trouble actually inputting data to go from one table to the next. Here's my relationship table.

enter image description here

Can you explain a little specifically more what aspects of normalization aren't clicking? I can, for example, explain how, in the example you've posted, there are columns that are dependent on other columns, and don't belong in that table. But is that the sort of thing you're asking? — Paul Richter, Oct 11 '14 at 04:51
Take a look into this... http://www.studytonight.com/dbms/database-normalization.php — Jayakarthik Appasamy, Oct 11 '14 at 04:52
There's a great description of normalization here: http://stackoverflow.com/questions/246701 — VoiceOfSoftware, Oct 11 '14 at 04:53
possible duplicate of [What is Normalisation (or Normalization)?](http://stackoverflow.com/questions/246701/what-is-normalisation-or-normalization) — Barmar, Oct 11 '14 at 05:06
Sorry for the confusion, Paul. What I'm confused with at the moment is how to actually break up the tables into separate tables, like for the patients, dentists, and appointments. — Onikouzou, Oct 11 '14 at 05:10

Mike Chamberlain · Accepted Answer · 2014-10-11T06:26:48.430

On a very basic level, any repeating values in a table are candidates for normalization. Duplicated data is usually a bad idea. Say you needed to update a patient's surname - you now have to update all the occurrences in this table, and possibly many others throughout the rest of the database. Much better to store each patient's details in one place only.

This is where normalization comes in. Looking down the columns, you can see that there are repeating values for data about dentists, patients and surgeries, so we should normalize towards having tables for each of these entities, as well as the original table that contains appointments, giving you four tables in total.

Extract the entities out into their own tables, and give each row a primary (unique) key - just use an incrementing integer for now. (Edit: as suggested in the comment we could use the natural keys of PatientNo, StaffNo and SurgeryNo instead of creating surrogates.)

Then, instead of each patient's name and number appearing multiple times in the appointments table, we just reference the key of the master record in the Patient table. This is called a foreign key.

Then, do the same for Dentist and Surgery.

You will end up with tables looking something like this:

APPOINTMENT

AppointmentID  DentistID  PatientID  AppointmentTime   SurgeryID
----------------------------------------------------------------
1              1          1          12 Aug 03 10:00   1
2              1          2          ...               2
3              2          3          ...               1
4              2          3          ...               1
5              3          2          ...               2
6              3          4          ...               3


DENTIST

DentistID  Name             StaffNo
--------------------------------------
1          Tony Smith       S1011
2          Helen Pearson    S1024
3          Robin Plevin     S1032


PATIENT

PatientID  Name              PatientNo
---------------------------------------
1          Gillian White     P100
2          Jill Bell         P105
3          Ian MackKay       P108
4          John Walker       P110


SURGERY

SurgeryID  SurgeryNo
-------------------------
1          S10
2          S15
3          S13

Okay cool, that makes sense. I didn't think to make a separate table for Surgery. Where do you actually make it so they reference one another, though? Because I have to use staffNo and patientNo, how does that apply instead of the basic 1,2,3? I need to use S1011, P100, etc. Does that make sense? — Onikouzou, Oct 11 '14 at 05:19
Yes, if you want you could use the staff and patient numbers as your primary key and form you relationships using those instead. This is a http://en.wikipedia.org/wiki/Natural_key. In the real world, however, we tend to favour using a http://en.wikipedia.org/wiki/Surrogate_key, as I did above. Have a read of those articles for the pros and cons of each. — Mike Chamberlain, Oct 11 '14 at 06:17
How do I go about actually linking them together? I know I have to go in through the relationships thing in Access. — Onikouzou, Oct 11 '14 at 15:10
Sorry for so many comments, but is the master table going to be appointments or Patients? I'd assume it would be patients because that's where you're putting all the information into? — Onikouzou, Oct 11 '14 at 15:58
I haven't used Access in a long time but a couple of seconds googling found this: http://office.microsoft.com/en-au/access-help/create-edit-or-delete-a-relationship-HA010072597.aspx. I'm not really sure what you mean by the "master" table. — Mike Chamberlain, Oct 11 '14 at 22:10

score 1 · Answer 2 · answered Oct 11 '14 at 05:14

The first step is to data modelling and denormalization is to understand your data. Study it an understand the domain "objects" or tables that exist within your model. That will give you an idea of how to start. Sometimes a single table or query sample is not enough to fully understand the database, but in your case, we can use the sample data and make some assumptions.

Secondly, look for repeated / redundant data. If you see copies of names, there is a good chance that is a candidate for a foreign key. Our assumption tells us that STAFF_NO is a primary key candidate for DENTIST because each unique STAFF_NO correlates to a unique DENTIST_NAME, so I see a good candidate DENTIST table (STAFF_NO, DENTIST_NAME)

Example in some table of SURGERY:

ID     STAFF_NO    DENTIST_NAME
1      1           Fred Sanford
2      1           Fred Sanford
3      3           Lamont Sanford
4      3           Lamont Sanford

Why store these over and over? What happens when Fred says "But my correct name is Fred G Sanford", so you have to update your database. In the current table, you have to update the name is many rows. If you had normalized it, you'd have a single location for the name, in the DENTIST table.

So I can take the unique dentists and store them in DENTIST

create table DENTIST(staff_no integer primary key, dentist_name varchar(100));

-- One possible way to populate our dentist table is to use a distinct query from surgery

insert into DENTIST
    select distinct staff_no, dentist_name from surgery;

STAFF_NO    DENTIST_NAME
1           Fred Sanford
3           Lamont Sanford

SURGERY table now points to DENTIST table

ID     STAFF_NO
1      1       
2      1       
3      3       
4      3

And you can now create a view, VIEW_SURGERY to join the DENTIST_NAME back in to satisfy the needs of typical queries.

select s.id, d.staff_no, d.dentist_name
  from surgery s join dentist d
    on s.staff_no = d.staff_no   -- join here

So now a unique update to DENTIST, by the dentist primary key will update a single row.

update dentist set name = 'Fred G Sanford' where staff_no = 1;

Add query view will show the updated name for N rows:

select * from view_surgery

ID     STAFF_NO    DENTIST_NAME
1      1           Fred G Sanford
2      1           Fred G Sanford
3      3           Lamont Sanford
4      3           Lamont Sanford

In short, you are removing redundancy.

This is just a sample, and one way to do it. Manual normalization like this is not as common when you have modelling tools, but the point is, we can look at data, spot redundancies and factor those redundancies into new tables, and relate those new tables by foreign keys and joins, then build views to represent the original data.

How do I normalize data in a database?

2 Answers2