Identifying functional dependencies (FDs)

Question

I am working with a table that has a composite primary key composed of two attributes (with a total of 10) in 1NF form.

In my situation a fully functional dependency involves the dependent relying on both attributes in my primary key.
A partial dependency relies on either one of the attributes from the primary key.
A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute from my primary key.

Pulling the transitive dependencies out of the table, seems do this after normalization, but my assignment requires us to identify all functional dependencies before we draw the dependency diagram (after which we normalize the tables). Parenthesis identify the primary key attributes:

(Student ID), Student Name, Student Address, Student Major, (Course ID), Course Title, Instructor ID, Instructor Name, Instructor Office, Student_course_grade

Only one class is taught for each course ID.
Students may take up to 4 courses.
Each course may have a maximum of 25 students.
Each course is taught by only one Instructor.
Each student may have only one major.

philipxy · Accepted Answer · 2022-10-23T22:56:07.517

From your question it seems that you do not have a clear understanding of basics.

Application relationships & situations

First you have to take what you were told about your application (including business rules) and identify the application relationships (aka associations) (aka relations, in the math sense of association). Each gets a (base) table (aka relation, in the math sense of associated tuples) variable. Such an application relationship can be characterized by a row membership criterion (aka meaning) (aka predicate) that is a statement template. Eg suppose criterion student [si] takes course [ct] has table variable TAKES. The parameters of the criterion are the columns of its table. We can use a table name with columns (like an SQL declaration) as a shorthand for the criterion. Eg TAKES(si,ct). A criterion plus a row makes a statement (aka proposition) about a situation. Eg row (17,'CS101') gives student 17 takes course 'CS101' ie TAKES(17,'CS101'). Rows that give a true statement go in the table and rows that make a false one stay out.

If we can rephrase a criterion as the AND/conjunction of two others then we only need the tables with those other criteria. This is because NATURAL JOIN is defined so that the NATURAL JOIN of two tables containing the rows making their criteria true returns the rows that make the AND/conjunction of their criteria true. So we can NATURAL JOIN the two tables to get back the original. (This is what normalization is doing by decomposing tables into components.)

/* rows where
    student with id [si] has name [sn] and address [sa] and major [sm]
        and takes course [ci] with title [ct]
        from instructor with id [ii] and name [in] and office [io]
        with grade [scg]
*/
T(si,sn,sa,sm,ci,ct,ii,in,io,scg)

/* rows where
    student with id [si] has name [sn] and address [sa] and major [sm]
        and takes course [ci] with grade [scg]
*/
SG(si,sn,sa,sm,ci,scg)

/* rows where
    course [ci] with title [ct]
        is taught by instructor with id [ii] and name [in] and office [io]
*/
CI(ci,ct,ii,in,io,scg)

Now by the definition of NATURAL JOIN, the rows where SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg) are the rows in SG NATURAL JOIN CI.

And since T(si,sn,sa,sm,ci,ct,ii,in,io,scg) when/iff SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg), ie since the rows where T(si,sn,sa,sm,ci,ct,ii,in,io,scg) are the rows where SG(si,sn,sa,sm,ci,scg) AND CI(ci,ct,ii,in,io,scg), we have T = SG NATURAL JOIN CI.

Together the application relationships and situations that can arise determine both the rules and constraints! They are just things that are true of every application situation or every database state (ie values of one or more base tables) (which are are a function of the criteria and the possible application situations.)

Then we normalize to reduce redundancy. Normalization replaces a table variable by others whose predicates AND/conjoin together to the original's when this is beneficial.

The only time a rule can tell you something that you don't know already know from the (putative) criteria and (putative) situations is when you don't really understand the criteria or what situations can turn up, and the a priori rules are clarifying something about that. A person giving you rules is already using application relationships that they assume you understand and they can only have determined that a rule holds by using them and all the application situations that can arise (albeit informally)!

(Unfortunately, many presentations of information modeling don't even mention application relationships. Eg: If someone says "there is a X:Y relationship" then they must already have in mind a particular binary application relationship between entities; knowing it and what application situations can arise, they are reporting that it has a certain cardinality in a certain direction. This will correspond to some application relationship, represented by (a projection of) a table using column sets that identify entities. Plus some presentations/methods call FKs "relationships"--confusing them with those relationships.)

Check out "fact-based" information modeling methods Object-Role Modeling or (its predecessor) NIAM.

FDs & CKs

Given the criterion for putting rows into or leaving them out of a table and all possible situations that can arise, only some values (sets of rows) can ever be in a table variable.

For every subset of columns you need to decide which other columns can only have one value for a given subrow value for those columns. When it can only have one we say that the subset of columns functionally determines that column. We say that there is a FD (functional dependency) columns->column. This is when we can express the table's predicate as "... AND column=F(columns)" for some function F. (F is represented by the projection of the table on the column & columns.) But every superset of that subset will also functionally determine it, so that cuts down on cases. Conversely, if a given set does not determine a column then no subset of the set does. Applying Armstrong's axioms gives all the FDs that hold when given FDs hold. (Algorithms & software are available to apply them & determine FD closures & covers.) Also, you may think in terms of column sets being unique; then all other columns are functionally dependent on that set. Such a set is called a superkey.

Only after you have determined the FDs can you determine the CKs (candidate keys)! A CK is a superkey that contains no smaller superkey. (That a CK and/or superkey is present is also a constraint.) We can pick a CK as PK (primary key). PKs have no other role in relational theory.

A partial dependency relies on either one of the attributes from the Primary key.

Don't use "involve" or "relies on" to give a definition. Say, "when" or "iff" ("if and only if").

Read a definition. A FD that holds is partial when/iff using a proper subset of the determinant gives a FD that holds with the same determined column; otherwise it is full. Note that this does not involve CKs. A relation is in 2NF when all non-prime attributes are fully functionally dependent on every CK.

A transitive dependency involves two or more non-key attributes in a functional dependence where one of the non-key attributes is dependent on a key attribute (from my PK).

Read a definition. S -> T is transitive when/iff there is an X where S -> X and X -> T and not (X -> S) and not (X = T). Note that this does not involve CKs. A relation is in 3NF when all non-prime attributes are non-transitively dependent on every CK.

"1NF" has no single meaning.

Thanks for this explanation, it was quite helpful. I needed a solid breakdown like this - I was receiving too much 'half-information' from different sources. Is it true that all dependencies must be identified to create these application relationships (thus, transitive dependencies)? — Ephexx, Dec 10 '14 at 23:58
So, from the business rules I can identify my 'tables' that I will place the corresponding columns inside of? For example, 1 class taught for each course ID: I would have a 'Course' table, with the two columns 'Course ID' and 'Course Title' inside of it. Do I establish dependencies before this, or will doing this enable me to establish dependencies? (That is just an example, rather than list all of the columns) — Ephexx, Dec 11 '14 at 01:10
Thank you so much. You've helped me greatly. I do have another question - so I now have two tables: Course, and Student. Course contains all course & instructor related information (given that they are all in 1:1 relationships with each other), but in my student table I have 'Stu_Crse_Grade'. In the table we are given to start with (non-normalized table), the student ID's are listed multiple times for their different courses and different grades. How can I implement this into the table? Stu_Crse Grade is messing everything up. — Ephexx, Dec 11 '14 at 03:26
I do not know which subset of columns the column 'Stu_Crse Grade' relates with. The 'Student' table would produce subrow values that would cause duplicates in other columns inside the table. It doesn't make sense to put the attribute into the 'Course' table, because it is dependent on Student ID and Course ID combined. Should I create another subset of columns to satisfy this attribute? That would involve using columns more than once... — Ephexx, Dec 11 '14 at 03:58
If I understand correctly, predicates identify the application relationship between two objects. Objects become the base tables. Each application relationship is characterized by a row membership criterion - the predicate. I took your advice, and I came up with: "Students are graded in every course." and "Each student is graded by one or more Instructors." Is this a proper way of splitting the criterion? — Ephexx, Dec 11 '14 at 04:37
When I am checking a column with each other column, should I be viewing the Student subset and Course subset as two different tables? Or are they merely subsets of columns? — Ephexx, Dec 12 '14 at 01:31
In another question your assignment says to determine FDs for this one big table, doesn't it? Then normalization will decompose. Then you can determine the predicates for the components. PS: Maybe "checking a column with each other column" means checking whether a subset of columns functionally determines a column?? *Please* put in the effort to be *clear*. — philipxy, Dec 12 '14 at 03:41
I was trying to be as clear as possible. You're using terminology that I don't understand and you've been very helpful in describing everything to me, however this is my first database course. Anyways, thank you very much for the help. I've managed to normalize the tables to 3NF, and I am finishing up the dependency diagrams & actual database programming tonight. Cheers to you! — Ephexx, Dec 12 '14 at 18:54

score 0 · Answer 2 · answered Dec 10 '14 at 12:57

I am inferring a functional dependency that was not listed in your business rules. Namely that instructor ID determines instructor name.

If this is true, and if you have both instructor ID and instructor name in the Course table, then this is not in 3NF, because there is a transitive dependency between Course ID, Instructor ID, and Instructor Name.

Why is this harmful? Because duplicating the instructor name in each course an instructor teaches makes updating an instructor name difficult, and possible to do in an inconsistent manner. Inconsistent instructor name is just another bug you have to watch out for, and 3NF obviates the problem. The same argument could be made for Instructor office.

Identifying functional dependencies (FDs)

2 Answers2

Linked