Composite primary key or primary surrogate key with following scenario?

Question

I know it's been asked many times, but I wanted to explain my scenario, and see if there are any benefits of using Identity column as primary key instead of using a composite primary key.

I'm currently reading two text files: File1 has Make & Model of car, while File2 has Make, Model, Year of car. Make, Model combination in File2 will always be in File1.

So I created table [Car] composed of columns of MakeId (identity), Make and Model. Data for table [Car] looks like this. The data in [Car] is an exact replica of File1:

[MakeId]       [Make]      [Model]
1              HONDA       ACCORD
2              HONDA       CIVIC
3              FORD        FOCUS
4              FORD        ESCORT

For File2, I created table [CarYear] with columns CarYearId (identity), Make, Model, Year. The data in [CarYear] is an exact replica of File2:

[CarYearId] [Make]      [Model]     [Year]
1           HONDA       ACCORD      2002
2           HONDA       ACCORD      2001
3           HONDA       ACCORD      2004
4           HONDA       CIVIC       1998
5           FORD        FOCUS       1998
6           FORD        ESCORT      2001
7           FORD        ESCORT      2002

Is there any reason why I shouldn't use Make, Model a composite primary key? Given my case, since I have Make & Model in both tables, I can easily just search the 2nd table directly instead of having to do inner joins.

This has been asked a lot (http://stackoverflow.com/questions/1383062/composite-primary-key and http://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables). Honestly, I'd recommend doing it how it works best for your situation. You should perhaps consider normalizing the data -- makes and models could be in their own table with a relationship table if needed. — sgeddes, Feb 27 '15 at 21:03
it depends. lets assume, your make/model combination will never get any attributes since cars tend to need a year to identify the model. in that case i wont introduce a generated MakeId. Otherwise i Would. — A ツ, Feb 27 '15 at 21:27

Mike Sherrill 'Cat Recall' · Accepted Answer · 2015-02-28T00:39:50.513

File1 has Make & Model of car . . .

So the data of interest looks like this.

make        model
--
HONDA       ACCORD
HONDA       CIVIC
FORD        FOCUS
FORD        ESCORT

The column "make" is clearly not a candidate key. As far as you can tell from this sample of data, "model" looks like a candidate key. I actually had to research this issue several years ago, and I found only a couple of models that were built by more than one manufacturer, and none of those were current. But that doesn't really matter.

Whether the candidate key here is {make, model} or {model}, this table is in 6NF.¹ If we assume that the only candidate key is {make, model}, I might implement it like this in standard SQL.

create table car_models (
  make varchar(15) not null,
  model varchar(15) not null,
  primary key (make, model)
);

File2 has Make, Model, Year of car.

So the data of interest looks like this.

make        model       year
--
HONDA       ACCORD      2002
HONDA       ACCORD      2001
HONDA       ACCORD      2004
HONDA       CIVIC       1998
FORD        FOCUS       1998
FORD        ESCORT      2001
FORD        ESCORT      2002

Following the assumptions about the key in the previous table, this table has only one candidate key, and it has only one additional attribute. It, too, is in 6NF. A SQL version might look like this.

create table car_model_years (
  make varchar(15) not null,
  model varchar(15) not null,
  model_year integer not null
    check (model_year between 1886 and 2099),
  primary key (make, model, model_year),
  foreign key (make, model) references car_models (make, model)
);

These tables have no redundant data. You can't remove any columns without breaking the semantics or compromising the integrity of the data. Foreign keys are repeated down the rows of "car_model_years", but that's not redundant--that's exactly what foreign keys are for.

Is there any reason why I shouldn't use Make, Model a composite primary key?

As a theoretical (relational) matter, no, there isn't. If you start in 6NF, adding a surrogate ID number denormalizes that table. (6NF requires a single candidate key.) Even if you do add a surrogate ID number, you still have to declare {make, model} as not null unique. Failure to declare that constraint makes a table liable to end up looking like this.

model_id  make   model
--
1         Honda  Accord
2         Honda  Accord
3         Honda  Accord

As a practical matter, not a theoretical (relational) matter, these 6NF tables will probably perform better than denormalizations of them using surrogate ID numbers. For example, queries on "car_model_years" that are based on make and model will generally use an index-only scan--they won't have to read the base table at all.

As another practical matter, some application frameworks deal poorly with any key besides an id number. IMHO, this justifies using a better framework, though, not compromising the structure of your database.

1. "... a 'regular' relvar is in 6NF if and only if it consists of a single key, plus at most one additional attribute." Date, CJ, Database in Depth: Relational Theory for Practitioners, p 147. A regular relvar is a nontemporal relvar.

Thanks for the info. The two reasons for not wanting to use a surrogate key were: 1) I would need to search the primary table for every insert to retrieve the PK ID, 2) I tend to do many ad-hoc queries, so removing the unnecessary joins makes my job easier. I understand there's redundant data, but the pros outweigh the cons. — fdkgfosfskjdlsjdlkfsf, Mar 02 '15 at 17:16
@rbhatup: There's no redundant data there. Search this page for "no redundant data". — Mike Sherrill 'Cat Recall', Mar 02 '15 at 17:46

Composite primary key or primary surrogate key with following scenario?

1 Answers1

Linked