What is wrong with a transitive dependency?

Question

I have some transitive dependencies in my database design. I have been told by my superiors that these can cause bugs. I am finding it difficult to find resources that will tell me how having these dependencies will cause bugs. What kind of problems will they cause?

I am not disputing the fact, just eager to learn what kind of problems they can cause.

Edit for more details:

From wikipedia :

Transitive dependency
A transitive dependency is an indirect functional dependency, one in which X→Z only by virtue of X→Y and Y→Z.

Could you provide an example? I hardly remember that database course cause it was totally irrelevant. — usr, Mar 30 '12 at 21:04
it depends if you persist the relation. suppose you do persist X->Z then delete Y. now you broke something. — Randy, Mar 30 '12 at 21:08
Transitive dependencies lead to redundant data, redundant data leads to update anomalies, and update anomalies lead to the dark side! — Tim Lehner, Apr 02 '12 at 14:01
That defintion of transitive dependency is wrong. Also, it is transitive dependencies on CKs (candidate keys) that are a problem. The answers here are hopelessly vague for actually doing design/normalization. — philipxy, Oct 20 '17 at 21:45

score 65 · Accepted Answer · edited Jun 10 '20 at 21:51

I'll explain by an example:

-------------------------------------------------------------------
|  Course  |    Field     |   Instructor   |  Instructor Phone    |
-------------------------------------------------------------------
|  English |  Languages   |  John Doe      |     0123456789       |
|  French  |  Languages   |  John Doe      |     0123456789       |
|  Drawing |  Art         |  Alan Smith    |     9856321158       |
|  PHP     |  Programming |  Camella Ford  |     2225558887       |
|  C++     |  Programming |  Camella Ford  |     2225558887       |
-------------------------------------------------------------------

If you have a Course you can easily get its Instructor so Course->Instructor.
If you have an Instructor you can't get his Course as he might be teaching different courses.
If you have an Instructor you can easily get his Phone so Instructor->Phone.

That means the if you have a Course then you can get the Instructor Phone which means Course->Instructor Phone (i.e. Transitive dependency)

Now for the problems:

If you delete both the French and English courses then you will delete their instructor John Doe as well and his phone number will be lost forever.
There is no way to add a new Instructor to your database unless you add a Course for him first, or you can duplicate the data in an Instructors table which is even worse.
If Instructor John Doe changes his phone number then you will have to update all Courses that he teaches with the new info which can be very prone to mistakes.
You can't delete an Instructor from your database unless you delete all the courses he teaches or set all his fields to null.
What if you decide to keep the birth date of your instructors? You will have to add a Birth Date field to the Courses table. Does this even sound logical? Why keep an instructor information in the courses table in the first place?

so how to solve such transitive dependency especially in the example you showed. — HQuser, Feb 04 '16 at 19:49
@user2015669 basically you use "Database Normalization". Aiming for 3NF is usually a good start, but it really depends on your application. — Songo, Feb 04 '16 at 20:56

Branko Dimitrijevic · Answer 2 · 2012-04-03T14:37:33.107

One way to express the 3NF is:

All attributes should depend on the key, whole key and nothing but the key.

The transitive dependency X->Y->Z violates that principle, leading to data redundancy and potential modification anomalies.

Let us break this down:

By definition, for a functional dependency X->Y->Z to also be transitive, the X<-Y must not hold.
If Y was a key, the X<-Y would hold, so Y cannot be a key. (FOOTNOTE1)
Since Y is not a key, any given Y can be repeated in multiple rows.
The Y->Z implies that all rows holding the same Y must also hold the same Z. (FOOTNOTE2)
Repeating the same (Y, Z) tuple in several rows does not contribute any useful information to the system. It is redundant.

In short, since Y is not a key and Y->Z, we have violated the 3NF.

Redundancies lead to modification anomalies (e.g. updating some but not all of the Zs "connected" to the same Y essentially corrupts the data, since you no longer know which copy is correct). This is typically resolved by splitting the original table into two tables, one containing {X, Y} and the other other containing {Y, Z}, This way, Y can be a key in the second table and Z is not repeated.

On the other hand, if the X<-Y does hold (i.e. X->Y->Z is not transitive), then we can retain a single table, where both X and Y are keys. Z won't be unnecessarily repeated in this scenario.

(FOOTNOTE1) A key is a (minimal) set of attributes that functionally determine all of the attributes in a relation. Rationale: If K is a key, there cannot be multiple rows with the same value of K, so any given value of K is always associated to precisely one value of every other attribute (assuming 1NF). By definition (see FOOTNOTE2), "being associated to precisely one" is the same thing as "being in a functional dependency".

(FOOTNOTE2) By definition, Y->Z if, and only if, each Y value is associated with precisely one Z value.

Example:

Assuming each message has exactly one author and each author has exactly one primary e-mail, attempting to represent messages and users in the same table would lead to repeating e-mails:

MESSAGE                         USER    EMAIL
-------                         ----    -----
Hello.                          Jon     jon@gmail.com
Hi, how are you?                Rob     rob@gmail.com
Doing fine, thanks for asking.  Jon     jon@gmail.com

(In reality, these would be MESSAGE_IDs, but let us keep things simple here.)

Now, what happens if Jon decides to change his e-mail to, say, "jon2@gmail.com"? We would need to update both of Jon's rows. If we only update one, then we have the following situation...

MESSAGE                         USER    EMAIL
-------                         ----    -----
Hello.                          Jon     jon2@gmail.com
Hi, how are you?                Rob     rob@gmail.com
Doing fine, thanks for asking.  Jon     jon@gmail.com

...and we no longer know which one of the Jon's e-mails is correct. We have essentially lost the data!

The situation is especially bad since there is no declarative constraint we could use to coerce the DBMS into enforcing both updates for us. The client code will have bugs and is probably written without much regard for complex interactions that can happen in the concurrent environment.

However, if you split the table...

MESSAGE                         USER
-------                         ----
Hello.                          Jon 
Hi, how are you?                Rob 
Doing fine, thanks for asking.  Jon 

USER    EMAIL
----    -----
Jon     jon@gmail.com
Rob     rob@gmail.com

...there is now only one row that knows about Jon's e-mail, so ambiguity is impossible.

BTW, all this can be viewed as just another expression of the DRY principle.

Points 3 and 4 do not follow because Y could be a key (as you already implied in 2). Crucially, 3NF is violated *only* if Z is not also a key or part of a key. This is a "flaw" of 3NF. 3NF only eliminates transitive dependencies of non-prime attributes. — nvogel, Apr 03 '12 at 05:44
@sqlvogel I don't understand. Why points 3 and 4 don't follow? — Branko Dimitrijevic, Apr 03 '12 at 09:32
Because you say "... Y cannot be key... " and "Since Y is not key...". Those are assumptions that are not implied by anything you said in 1 and 2. Y might well be a key. — nvogel, Apr 03 '12 at 10:06
@sqlvogel They are implied by: "there are potentially multiple Xs for any given Y". These Xs must be represented by separate rows (otherwise you break 1NF). Since each of these rows contains the same Y, Y is not unique. That alone is not a problem, until you introduce Z in the picture. — Branko Dimitrijevic, Apr 03 '12 at 10:39
*Potentially*, yes. But you can't draw the conclusions in 3 and 4 except for the subset of cases where Y is not a key. If that was all you intended then you could state at the outset that you are considering only cases where Y is not a key - but then you aren't defining 3NF any more, you are only dealing with the special case where the relation has only one key. — nvogel, Apr 03 '12 at 11:05
@sqlvogel It is correct that I cannot conclude 3 and 4 _unless_ Y is not a key. However, if Y were a key, this would imply X<-Y, so [by definition](http://en.wikipedia.org/wiki/Transitive_dependency) this would no longer be a transitive dependency. Since the "starting condition" for our discussion is the existence of transitive dependency, we _know_ X<-Y does not hold, so we _know_ Y cannot be a key. — Branko Dimitrijevic, Apr 03 '12 at 11:55
I get it now. You're right but maybe you could have worded your first sentences to make it clear what you are talking about. Saying "unless there is also X<-Y" seems to imply that X<-Y might in fact be the case. — nvogel, Apr 03 '12 at 12:42
@sqlvogel This is a fair criticism. I have re-worded my answer to be both more precise and (hopefully) easier to understand. This is the kind of discussion that forces a person to see things more clearly and I thank you for it :) — Branko Dimitrijevic, Apr 03 '12 at 14:22

score 6 · Answer 3 · answered Mar 30 '12 at 21:10

6

If there's transitive dependencies in your table then it's not compliant with 3NF; so there's a high probability that there is redundant data in your table. Check this to clarify this concept.

answered Mar 30 '12 at 21:10

Carlos Gavidia-Calderon

7,145
9
34
59

score 3 · Answer 4 · answered Mar 30 '12 at 21:11

Take a look at this link:

http://en.wikipedia.org/wiki/Transitive_dependency

Using the example, what would happen if I update the nationality of Jules Verne on one row, but not the other? Author nationality is determined by author alone, not the combination of book and author. So with the example data structure, I could potentially ask the database the nationality of Jules Verne. If I ran the following SQL command

SELECT TOP 1 author_nationality FROM books WHERE author='Jules Verne'

I could get a different answer depending on how the database selects TOP 1.

What is wrong with a transitive dependency?

4 Answers4

Linked