File1 has Make & Model of car . . .
So the data of interest looks like this.
make model
--
HONDA ACCORD
HONDA CIVIC
FORD FOCUS
FORD ESCORT
The column "make" is clearly not a candidate key. As far as you can tell from this sample of data, "model" looks like a candidate key. I actually had to research this issue several years ago, and I found only a couple of models that were built by more than one manufacturer, and none of those were current. But that doesn't really matter.
Whether the candidate key here is {make, model} or {model}, this table is in 6NF.1 If we assume that the only candidate key is {make, model}, I might implement it like this in standard SQL.
create table car_models (
make varchar(15) not null,
model varchar(15) not null,
primary key (make, model)
);
File2 has Make, Model, Year of car.
So the data of interest looks like this.
make model year
--
HONDA ACCORD 2002
HONDA ACCORD 2001
HONDA ACCORD 2004
HONDA CIVIC 1998
FORD FOCUS 1998
FORD ESCORT 2001
FORD ESCORT 2002
Following the assumptions about the key in the previous table, this table has only one candidate key, and it has only one additional attribute. It, too, is in 6NF. A SQL version might look like this.
create table car_model_years (
make varchar(15) not null,
model varchar(15) not null,
model_year integer not null
check (model_year between 1886 and 2099),
primary key (make, model, model_year),
foreign key (make, model) references car_models (make, model)
);
These tables have no redundant data. You can't remove any columns without breaking the semantics or compromising the integrity of the data. Foreign keys are repeated down the rows of "car_model_years", but that's not redundant--that's exactly what foreign keys are for.
Is there any reason why I shouldn't use Make, Model a composite primary key?
As a theoretical (relational) matter, no, there isn't. If you start in 6NF, adding a surrogate ID number denormalizes that table. (6NF requires a single candidate key.) Even if you do add a surrogate ID number, you still have to declare {make, model} as not null unique
. Failure to declare that constraint makes a table liable to end up looking like this.
model_id make model
--
1 Honda Accord
2 Honda Accord
3 Honda Accord
As a practical matter, not a theoretical (relational) matter, these 6NF tables will probably perform better than denormalizations of them using surrogate ID numbers. For example, queries on "car_model_years" that are based on make and model will generally use an index-only scan--they won't have to read the base table at all.
As another practical matter, some application frameworks deal poorly with any key besides an id number. IMHO, this justifies using a better framework, though, not compromising the structure of your database.
1. "... a 'regular' relvar is in 6NF if and only if it consists of a single key, plus at most one additional attribute." Date, CJ,
Database in Depth: Relational Theory for Practitioners, p 147. A regular relvar is a nontemporal relvar.