Normalization: partial dependency & transitive dependency attribute collision

Question

When I was practicing normalization I encountered this question:

Normalize the following

AB (a, b, c, d, e, f, g)
b --> c, e
c --> e, g
a --> d

where a, b is the composite primary key.

I see this is already in 1NF; when I try to normalize it to 2NF, I see that e is partially dependent on b; at the same time it is transitively dependent on c, so I'm confused; how to proceed?

What is a real-life example showing this collision case?

You are effectively asking for us to rewrite a textbook with a bespoke tutorial & do your (home)work & you show no research or other effort. [ask] [Help] [How do I ask and answer homework questions?](https://meta.stackoverflow.com/q/334822/3404097) Show the steps of your work following a textbook/reference with justification & ask 1 specific researched non-duplicate question re the 1st place you are stuck/unsure. Quote definitions, theorems, algorithms & heuristics you rely on. All the steps are also SO faqs. Google with & without 'site:stackoverflow.com'. PS Please read the edit help. — philipxy, Jun 06 '22 at 01:56
"normalize" without other clarification doesn't mean anything in particular. What is yours exactly? ["1NF"](https://stackoverflow.com/a/40640962/3404097) has many meanings. Give yours. {a,b} cannot be a CK if the given FDs are a cover. PKs don't matter, CKs do. "e is partially dependent on b" Can't be true since it would require {}->e. "is transitively dependent on c" Partial & transitive FDs are not a problem per se; only certain ones violate NFs. "I'm confused" Why? How are you stopped from following what process? "What is a" Ask exactly 1 (specific researched non-duplicate) question. — philipxy, Jun 06 '22 at 02:37

Rex · Answer 1 · 2022-06-25T08:27:11.813

Far be it from me to clarify normal forms, but in the real world it usually works like this: (for this example the purpose is contact management.)

each table has a primary key field. I call it pk (primary key). Important: try to avoid using multiple columns to create the pk. Why? Because if the column data changes (and it will) your pk has to be deconstructed - not a good idea as this will upend your database! The pk is the id (for all intents and purposes) for the data in each row.

Important: Most databases allow you to specify an incremental value for a field. You do this for the pk field/column, which means it will add 1 to the last pk value whenever you add a new record (or row if you like). Most databases will also allow you to set one field as the primary key (hence the reason why I call name that field "pk"), which means the database won't allow any duplicate values in that column, which is very important.

each column is atomic meaning it can't be broken down any further, for example an address can be broken down into: street, city, state and zip, so you wouldn't create a table like this:

People

pk | name | address

You could instead create a table like this: (but keep reading)

People
--------------------
pk | first name | last name | street | city | state | zip

Note: suppose you were searching for a person by name. Well once you had that person the pk (id field) allows you identify any other data on that row, plus any related data (read on). That's the main reason why the pk exists.

Ask yourself if there is more than one of each?

Q: Does the person have more than one first name? Legally the answer is no, but they might have a nick name! Will there be more than one nick name? Probably Not or highly unlikely. Do you want to store the nick name? If yes, then add nick name to your table.

People
--------------------
pk | first name | nick name | last name | street | city | state | zip

Q: could the person have more than one address? Yes! Home address, Work address, mailing address, PO BOX, etc. So this is where the relational in RDMS (relational database management systems) comes into play. You need two tables: one for the people and one for addresses.

Table 1:

People
--------------------
pk | first name | nick name | last name 
10 | William    | Bill      | Smith

Table 2:

Addresses
--------------------
pk | fk | street               | city  | state | zip   | identifier
1  | 10 | 3110 Franklin Street | Ogden | UT    | 84041 | Home
2  | 10 | 2100 Washington blvd | Ogden | UT    | 84104 | Work

I call this a parent child relationship (in SQL this is a one-to-many relationship). The children are the addresses of the parent in this case. One person to many addresses.

Note: each row has a unique pk (primary key) and each pk represents the data for that row.

fk = foreign key The foreign key is the same value as the primary key in the People table 1. This is how the addresses are linked to a person in the People table 1.

Suppose you realized that each person needs a phone number.
Q: can each person have more than one phone number? Yes! Then this requires another table.

And now your database looks like this:

Table 1:

People
--------------------
pk | first name | nick name | last name 
10 | William    | Bill      | Smith

Table 2:

Addresses
-------------------
pk | fk | street          | city  | state | zip   | identifier
1  | 10 | 3110 Franklin   | Ogden | UT    | 84041 | Home
2  | 10 | 2100 Washington | Ogden | UT    | 84104 | Work

Table 3:

Phones
-------------------------
pk | fk | phone_number | identifier
1  | 10 | 801-555-1212 | Home
2  | 10 | 801-555-1213 | Work

This is how it's done in the real world by database professionals, but I've seen these rules broken a bunch of times by some large companies.

I've seen hundreds of databases and few if any followed normal forms or reality. So, I would say, "Yes! Definitely understand DB normal forms, but use them in conformity with reality in the real world. It's called a Relational Data (base) for a reason."

Beyond this there are times when you need a many-to-many relationship, which I'd be happy to diagram if anyone requests it.

Normalization: partial dependency & transitive dependency attribute collision

1 Answers1

People

Linked