What is the difference between primary key, candidate key and superkey in terms of functional dependency?
I know the difference, but I don't know their difference in terms of functional dependencies.
What is the difference between primary key, candidate key and superkey in terms of functional dependency?
I know the difference, but I don't know their difference in terms of functional dependencies.
Given a relation R
with a set of attributes T
and a set of functional depedencies F
, any set of attributes K
, subset of T
, that determines T
is called a superkey of R
. In other words, if the dependency K -> T
belongs to F+
(that is, can be derived from F
), then K
is a superkey of R
. From this definition, since it is always true that T -> T
, we know at least a superkey exists (T
).
A candidate key CK
is a superkey from which we cannot remove any attribute without causing it to lose the property of being a superkey. In other words, if CK = A1,...,An
, then there is no proper subset of it which is a superkey. Of course from this definition we can infer that each superkey with a single attribute is automatically a candidate key.
A primary key has nothing to do with the relation theory, but it is a concept used in practice, inside relational database management systems. In general, it is either a candidate key chosen among those availables, or it is an "artificial" unique identifier (surrogate key) introduced without any consideration of candidate keys already present in the relation.
Finally note that null values are not defined in the relational theory, but only in practical relational database management systems, and usually in those systems the primary keys cannot have a null value.
I know the difference, but I don't know their difference in terms of functional dependencies.
If you do not memorize & apply definitions then you don't know what things are or what is the difference between ones that are in some sense similar. You just have some vague notions. Also if you don't force yourself to think & write precisely (and require the same of writers you read).
From this answer:
All the following terms/concepts are defined in parallel for table values and variables. A table variable has an instance of a FD (functional dependency), determinant, superkey, CK (candidate key) or PK (primary key) (in the variable sense) when every table value that can arise for it in the given business/application has that instance (in the table sense).
For sets of columns X and Y we can write X -> Y. We say that X is the determinant/determining set and Y is the determined set of/in functional dependency (FD) X -> Y.
We say X -> Y holds in table T when each subrow value for X only appears with the one particular subrow value for Y. Or we say X -> Y is a FD of/in T.
A superkey of a table T is a set of columns that functionally determines every column. A candidate key (CK) is a superkey that contains no smaller superkey. We can pick one CK as primary key (PK) and then call the other CKs alternate keys (AKs).
SQL tables are not relations and SQL operators are not their relational/mathematical counterparts. Among other things, SQL has duplicate rows, nulls & a kind of 3-valued logic. But although you can borrow terms and give them SQL meanings, you can't just substitute those meanings into other RM definitions or theorems and get something sensible or true. So we must convert an SQL design to a relational design, apply relational notions, then convert back to SQL. There are special cases where we can do certain things directly in SQL because we know what would happen if we did convert, apply & convert back.