Identifying Functional Dependencies II

Question

Here is an example which should clear things up for the last post.

Enter image description here

hireDate & carReg are the primary keys. Are there extra functional dependencies (FDs) other than the ones I have identified below? Modifications also welcome:

fd1 carReg -> make, model, outletNo, outletLoc
fd2 custNo -> custName
fd3 outletNo -> outletLoc
fd4 model -> make (only if we assume a model name is unique to a make)
fd5 carReg, hireDate -> make, model, custNo, custName, outletNo, outletLoc

I'm not sure if the above are correct and I am sure there are more.

Based on Mike Sherrill Cat Recall's answer... My question is this: How is custName -> custNo a valid FD? For the above relation, sure, a customer name maps onto exactly one customer number, but by intuition, we know more than one J SMith could be added to the table. If this is the case, this FD is void as it forms a 1..* relationship. Can we really say that custName -> custNo knowing this fact? Do we merely base FDs on the sample data? Or do we take into account the possible values that can be added?

[Why should I not upload images of code/data/errors when asking a question?](https://meta.stackoverflow.com/q/285551/3404097) [Why are images of text, code and mathematical expressions discouraged?](https://meta.stackexchange.com/q/320052/266284) — philipxy, Aug 13 '22 at 07:22
It is not clear what "the last post" refers to. It is unlikely to be [this one](https://stackoverflow.com/questions/5779229/set-membership-search-explanation) (chronologically the last one). There are [five intervening posts](https://stackoverflow.com/users/559142/user559142?tab=questions&sort=newest&page=8) between this one and *[Identifying Functional Dependencies](https://stackoverflow.com/questions/5773204/identifying-functional-dependencies)* (more likely) - less than 24 hours apart! — Peter Mortensen, Oct 11 '22 at 16:11

Mike Sherrill 'Cat Recall' · Accepted Answer · 2011-04-25T22:23:16.753

8

At a glance . . .

custName -> custNo
model -> make
outletLoc -> outletNo
carReg, custNo -> hireDate
carReg, custName -> hireDate

And I'm sure there are others. The sample data isn't representative, and that's a problem when you try to determine functional dependencies from data. Let's say your sample data had only one row.

carReg    hireDate make  model  custNo  custName  outletNo  outletLoc
--
MS34 0GD  14/5/03  Ford  Focus  C100    Smith, J  01        Bearsden

FDs answer the question, "Given one value for 'x', do I know one and only one value for 'y'?" Based on that one-row set of sample data, every attribute determines every other attribute. custNo determines hireDate. hireDate determines outletLoc. custName determines model.

When sample data isn't representative, it's easy to turn up FDs that aren't valid. You need more representative sample data to weed out some invalid functional dependencies.

custName -> custNo isn't valid ('C101', 'Hen, P')
carReg, custNo -> hireDate isn't valid ('MS34 0GD', 'C100', '15/7/04')
carReg, custName -> hireDate isn't valid ('MS34 0GD', 'Hen, P', '15/8/03')

You can investigate functional dependencies in sample data by using SQL.

create table reg (
  CarReg char(8) not null,
  hireDate date not null,
  Make varchar(10) not null,
  model varchar(10) not null,
  custNo char(4) not null,
  custName varchar(10) not null,
  outletNo char(2) not null,
  outletLoc varchar(15) not null
);

insert into reg values
('MS34 OGD', '2003-05-14', 'Ford', 'Focus', 'C100', 'Smith, J', '01', 'Bearsden'),
('MS34 OGD', '2003-05-15', 'Ford', 'Focus', 'C201', 'Hen, P', '01', 'Bearsden'),
('NS34 TPR', '2003-05-16', 'Nissan', 'Sunny', 'C100', 'Smith, J', '01', 'Bearsden'),
('MH34 BRP', '2003-05-14', 'Ford', 'Ka', 'C313', 'Blatt, O', '02', 'Kelvinbridge'),
('MH34 BRP', '2003-05-20', 'Ford', 'Ka', 'C100', 'Smith, J', '02', 'Kelvinbridge'),
('MD51 OPQ', '2003-05-20', 'Nissan', 'Sunny', 'C295', 'Pen, T', '02', 'Kelvinbridge');

Does model determine make?

select distinct model 
from reg
order by model;

model
--
Focus
Ka
Sunny

Three distinct models . . .

select model, make
from reg
group by model, make
order by model;

model   make
--
Focus   Ford
Ka      Ford
Sunny   Nissan

Yup. One make for each model. Based on the sample data, model -> make.

Does carReg, custName -> hireDate?

select distinct carReg, custName
from reg
order by custName;

carReg
--
MH34 BRP  Blatt, O
MS34 OGD  Hen, P
MD51 OPQ  Pen, T
MS34 OGD  Smith, J
NS34 TPR  Smith, J
MH34 BRP  Smith, J

Six distinct combinations of carReg and custName.

select carReg, custName, hireDate
from reg
group by carReg, custName, hireDate
order by custName;

carReg  custName  hireDate
--
MH34 BRP  Blatt, O  2003-05-14
MS34 OGD  Hen, P    2003-05-15
MD51 OPQ  Pen, T    2003-05-20
MH34 BRP  Smith, J  2003-05-20
NS34 TPR  Smith, J  2003-05-16
MS34 OGD  Smith, J  2003-05-14

Yup. One hireDate for each combination of carReg and custName. So based on the sample data, {carReg, custName} -> hireDate.

edited Apr 25 '11 at 22:23

answered Apr 25 '11 at 19:45

Mike Sherrill 'Cat Recall'

91,602
17
122
185

Hi thanks for your reply. I don't agree with a few of the FD's you mentioned..1) custName -> custNo cannot be right as several J Smiths can exist. 2) make -> model cannot be right as Ford make several models of car. 3) outletLoc -> outLetNo cannot be right as there could be many outlets within a location. 4) carReg, custNo -> hireDate AND carRegs, custName -> hireDate cannot be right as J Smith for example could have rented the same car out twice on separate days... – user559142 Apr 25 '11 at 20:13
I recognice your FD's are for that particular instance shown of the relation, but don't FD's have to hold for all possible values withing a domain...? – user559142 Apr 25 '11 at 20:14
The sample data you posted supports every FD I identified. Maybe you should read what I wrote again, especially the part that starts "The sample data isn't representative, and that's a problem . . ." and the part that starts "When sample data isn't representative . . ." – Mike Sherrill 'Cat Recall' Apr 25 '11 at 20:18
@catcall - By definition, a functional dependency must hold for ALL time. So surely, when mentioning FD's you must take into account any future additions to the table...Maybe I am wrong, I don't know?!?! Please don't get offended by me contesting your answer, it is the only way for me to learn. – user559142 Apr 25 '11 at 20:29
1

When you discover your sample data isn't representative, fix the sample data. See, for example, the sample data I posted after "custName -> custNo isn't valid". When your sample data is representative, automated tools can generate every possible 5NF schema. When your sample data isn't representative, all bets are off. – Mike Sherrill 'Cat Recall' Apr 25 '11 at 20:43
outletLoc -> outletNo would surely be wrong too as it doesn't say anywhere that there can only be one outlet within a location, right? I guess my question is, isn't the true definition of a functional dependency one that holds for any state of the above relation... – user559142 Apr 25 '11 at 21:04
Your sample data says there's only one outlet within a location. If I know outletLoc, I also know the one and only one value for outletNo. (And vice-versa.) FDs hold for all time, but sample data shows concretely what you understand about the problem space. If I were your client, and you wanted to fix my database by demonstrating that outletLoc doesn't acutally determine outletNo, you'd need to prove that by selecting sample data from my records that shows something like {01, London} as well as {01, Bearsden}. As a professional, you don't get to say, "Oh, that's wrong." You have to prove it. – Mike Sherrill 'Cat Recall' Apr 25 '11 at 21:29
So generally, you produce FD's based on the representative data, without considering further additions to the table? Thus, custName -> custNo is a valid FD as the data above does not prove otherwise? Furthermore, if another J Smith was added to the table, that would render custName -> custNo an invalid FD? After these questions are answered I think I will understand the concept. Thanks a lot for your time! – user559142 Apr 25 '11 at 21:56
My question is this: How is custName -> custNo a valid FD? For the above relation, sure, a customer name maps onto exactly one customer number, but by intuition, we know more than one J SMith could be added to the table. Can we really say that custName -> custNo knowing this fact? Do we merely base FD's on the sample data? Or do we take into account the possible values that can be added? – user559142 Apr 25 '11 at 22:58
1

Let me point out something I wrote, which you seem to have missed. "custName -> custNo isn't valid". Search this web page for that string. When you post sample data, and you ask us to derive functional dependencies from sample data, you should expect us to derive functional dependencies from sample data. (As a consultant, that's something I do very, very often.) And when someone turns up dependencies in the sample data that are incorrect, you fix the sample data. You don't appeal to intuition. Intuition doesn't scale well. Nobody else can see your intuition. Everybody can see sample data. – Mike Sherrill 'Cat Recall' Apr 26 '11 at 00:19
Ok. So if asked "identify the functional dependencies in these data" custName -> custNo is valid. However to prove that this is wrong, and to prove my intuition right, I would have to add data (such as another J Smith with a new custNo)? – user559142 Apr 26 '11 at 08:24
1

Yes. The sample data becomes the foundation for your test data. The more representative of the real world it is, the better. It's also a concrete representation of a shared understanding between you and your client. (That is, your client can sign off on it. Which doesn't mean they won't blame you if they change a constraint and things go pear shaped, but they probably won't be able to win a lawsuit about that.) – Mike Sherrill 'Cat Recall' Apr 27 '11 at 00:07

score 6 · Answer 2 · answered Apr 25 '11 at 21:08

6

Well, since you asked for a second opinion, I'll give you one.

The second opinion is that the first (CatCall's) is entirely correct.

Sample data do not suffice to identify/determine functional dependencies in the data. What is needed to identify/determine functional dependencies in the data, are user requirements, descriptions/definitions of the business environment the database is intended to support, ...

Only your users can tell you, one way or another, what functional dependencies apply. (Don't interpret this as meaning that you should be telling your users that they should be telling you "what the applicable FDs are", because your users will typically not know what the term means. However, what the applicable FDs are, can still be derived from nothing else than the business specs the user provides you with.)

(PS sample data may on the contrary indeed suffice to demonstrate that a certain given FD certainly will NOT apply. But that's not your question.)

answered Apr 25 '11 at 21:08

Erwin Smout

18,113
4
33
52

"sample data may on the contrary indeed suffice to demonstrate that a certain given FD certainly will NOT apply" -> ok so as the sample data isn't representative enough to state custName -> custNo is wrong, is it correct to say that for this paerticualr set of data, it is a valid FD? – user559142 Apr 26 '11 at 09:05
3

Not entirely correct. It would be correct to say that this particular set of data SATISFIES the FD. To say that an FD is "valid", is just sloppy and imprecise use of language. To say that an FD _applies_ ("upholds" or something like that) is to say that a certain rule (a uniqueness rule) is in effect _in the real world_. To say that the FD Name->ID _applies_, is to say that Name is a unique property of the persons that you deal with in your business (which is probably a falsehood). To say that that FD is "valid" is (sorry if this seems offensive) being verbose without saying anything. – Erwin Smout Apr 26 '11 at 11:57
1

If my comment appears to be overly nitpicky, consider this : (a) Are computers precise machines or not, in the sense that they always do _exactly as they're told_ ? (b) and if the answer to that question is 'yes', then how important is it for the people who program those computers _to be precise_ ? – Erwin Smout Apr 26 '11 at 12:00

philipxy · Answer 3 · 2022-10-11T23:28:21.250

A FD (functional dependency) expresses a certain property of a relation value or variable. We can say that it holds for or doesn't hold for (is satisfied by or isn't satisfied by) (is true of or is not true of) a given relation value. When we say it holds or doesn't hold for a relation variable we mean it holds or doesn't hold for every possible value for the variable that can arise in an application.

Also if we are given a value and we are told that the FDs it satisfies are the FDs that a variable that could hold it satisfies then by that assumption the variable's FDs are the value's FDs. (This is sometimes called "representative data" for the variable.) But if we are just given a value that might arise for a variable then we only know that

the FDs that don't hold in the value also don't in the variable
the trivial FDs of both hold
(the ones of the form S -> subset of S)
(the ones that must hold regardless of the value, based only on the attributes)
(which must be the same for the value & the variable)

From my answer to What did I do wrong? (Find FD from table):

We say that a FD (functional dependency) expression S -> T has a "determinant" set of attributes S and a "determined" set of attributes T. It says that a given subtuple value for S appears in a given relation value or variable/schema always with the same subtuple value for T. For S -> {A} we can say S -> A. For {A} -> T we can say A -> T.

Given a relation, we say that a FD "holds in" it or "is satisfied by" it or "is true" in it or (sloppily) "is in" it or (sloppily) it "has" a FD when what the FD says is true about it. Every FD that can be expressed using attributes of a relation value/variable/schema will either hold or not hold.

We can find all the FDs S -> T that hold in a relation by checking every subset of the set of attributes as S with every subset of attributes as T. There are also algorithms. FDs where S is a superset of T must hold and are called "trivial".

We can find all the FDs S -> A that hold in a relation by checking every subset of the set of attributes as S with every attribute as A. There are also algorithms. (Then to find all FDs that hold: FDs S -> {} hold trivially & whether S -> T for T with multiple elements can be found from the FDs S -> A.)

Here are some shortcuts: A set determines itself. If S -> T then every superset of S determines every subset of T. If S doesn't determine T then no subset of S determines any superset of T. If a set has a different subtuple of values in every tuple (ie it is "unique", ie it is a superkey) (including if it is a candidate key) then it determines every set. {} -> T when/iff every tuple has the same T subtuple value.

Given some FDs that hold, Armstrong's axioms generate all FDs that must also hold. The latter is called the "closure" of the former. A set of FDs that generates a certain closure is called a "cover". A cover is "minimal" or "irreducible" when removing any FD from it gives a set that is not a cover. A minimal/irreducible cover with every determinant unique is "canonical".

Usually we are not asked to give a closure for all FDs that hold in a schema, we are asked to give a canonical cover for them. In general if we only know some FDs that hold in a schema then we don't know that its closure is all the FDs that hold.

Assuming not every possible table value for a table variable is given, determining FDs for a table variable requires its meaning/predicate & the business rules to be given.

See my answer to Identifying functional dependencies (FDs).

score -1 · Answer 4 · answered Apr 27 '11 at 11:24

-1

Here's my attempt at relationships:

enter image description here

answered Apr 27 '11 at 11:24

Raj More

47,048
33
131
198

1

These many:1 "relationships" are *foreign keys* in some decomposition of the original, not *functional dependencies* that hold in the original (that would justify decompositions). – philipxy Apr 23 '18 at 04:58

Identifying Functional Dependencies II

4 Answers4

Linked