Which column(s) for Primary Key(s)

Question

Sample Data for Nuts and Bolts Thread Series. I'm trying to determine the Primary Key(s) for this data set. Here is a sample of a few valid combinations.

size     form     tpi
1/4      UNC      20
1/4      UNF      28
1/4      UNEF     32
5/16     UNC      18
5/16     UN       20
5/16     UNF      24
5/16     UN       28
5/16     UNEF     32
3/8      UNC      16
3/8      UN       20
3/4      UNC      10

Ultimately I'm trying to build pull down menus in a web app which allows the user to select a valid bolt such as 3/4-10UNC-2A HEX Head. A 3/4-6UNC-2A HEX Head is not valid because they don't manufacture a 3/4" bolt with 6 threads per inch (it wouldn't be loaded in this table).

The first pull down selects the bolt type, the second pull down provides bolt sizes that are available based on the bolt type (defined in another db table)

The third pull down would offer the final portion of the bolt definition. So if a user selects HEX Head, then 5/16" they would see choices for UNC-18, UN-20, UNF-24, UN-28, and UNEF-32.

My options for PK(s) could be:

surrogate and create unique constraints.
use size and form as composite PK, which determines tpi
use size and tpi as composite PK, which determines form
use form and tpi as PK, which determines size
all three fields as PK (probably wrong!)

It seems like it doesn't matter which PK option I choose in regards to the query because I would be querying based on bolt diameter to get the other two values. Something I left out is the 2A values, which relate to the UN Forms in another table, and a "doozey" of a requirement that further limits which ThreadSeries can be used with a bolt type.

I'm doing this in Java EE, using JSF, and JPA Entities if that matters

A primary key (or composite key, as you will likely be moving towards) is determined by uniqueness. From what I've seen, the only things that are unique across all columns are all three values. But, the question I suppose I'd have for you is, which of the columns are guaranteed to be unique? Which two columns are guaranteed to be unique with respect to the last column? — Makoto, Jun 04 '15 at 19:23
ah yes, I see it now after your comment and closer inspection the 5/16 UN 28 fails uniqueness if I only use two of the three columns for PK. — jeff, Jun 04 '15 at 19:35
Should you be building a parent child relationship with a surrogate key instead? table (ID, ParentID, FormType, TPI) This allows you to build a model allowing a user to filter on form, or size. Why limit the user to start with just size when they may be interested in all sizes for a form? this hierachy allows you to handle your 2A values as a parent to a given size/form. — xQbert, Jun 05 '15 at 20:33

score 2 · Accepted Answer · edited May 23 '17 at 12:06

use size and form as composite PK, which determines tpi
use size and tpi as composite PK, which determines form
use form and tpi as PK, which determines size

All of them are false assumptions. As the database grows, there is nothing to stop one of those assumptions being made false. Eg. a 1/4 UNC 28 being added to the stock.

Thus if you had created a PK out of any of them, that would break the database, and you would have to change the entire set of tables that are dependent on this PK.

You cannot determine Keys via FDs (except as a theoretical classroom exercise, using a's and b's and cs). This is not a classroom. The columns are real, they have meaning (a's and b's and c's have no meaning). You quite correctly described the issues, the meaning.

Key determination is a straight-forward logical exercise, part and parcel of data modelling.

Each of your example rows are a Fact. That must be stored. That must be unique.

all three fields as PK (probably wrong!)

That is the only correct PK. That is the only combination that provides row uniqueness, to ensure there are no duplicate rows in the table.

surrogate and create unique constraints.

Since you cannot avoid the PK that gives row uniqueness, that would be superfluous, an additional manufactured column plus index that does nothing.

A surrogate does not provide the logical row uniqueness that is demanded in the Relational Model (they provide physical record ID uniqueness).

If you do not understand what I am saying, please read this Answer, from the top to False Teachers

Something I left out is the 2A values, which relate to the UN Forms in another table, and a "doozey" of a requirement that further limits which ThreadSeries can be used with a bolt type.

The Relational Model is based on First Order Predicate Calculus, commonly known as First Order Logic (FOL). There is nothing that cannot be stated in FOL. Therefore, there is nothing that cannot be modelled in a Relational database.

The "theoreticians* who write books alleging to be about the Relational Model or Relational databases, do not understand these fundaments, they write that various things cannot be done; they use surrogates (anti-relational) for all tables, which results in a Record Filing System, with none of the Relational Integrity (as distinct from Referential integrity), power or speed of a Relational Database.

Open a new question re the "doozey" and ping me.

score 1 · Answer 2 · answered Jun 05 '15 at 19:25

Your issue is not with key values at all. What you are doing is "drilling down" to find the one unique combination of values across the three columns. That process does not use keys.

However, I would make one suggestion. Your primary grouping field is form. It is a non-arbitrary, fixed list of known values. For that reason, you could have a Forms table with one entry for each type of form: UN, UNC, UNF, etc. The the form field would be defined as a foreign key to that table.

This would provide two benefits.

Data integrity: No value could exist in that form field that has not been previously defined in the Forms table.
Performance: The Forms table would provide an easy source of the first drop-down box. Instead of executing a select distinct form from ThreadSeries you can execute select form from Forms. Each form has one and only one entry, no need to use the distinct -- and any forms that happened to not be listed in the ThreadSeries field (such as when you create a new form if for no other reason) would still show up.

As for primary key, you have two choices. One is a surrogate key. This has the benefit of being able to uniquely identify a row using only one value. The downside is that this value will have no meaning to your users. When they are looking for a 3/4-10UNC-2A HEX Head, they know how to specify that. They probably don't know (and would be very opposed to having to learn) that the value for that particular screw is 9383934747 or even 117. Those values are (by definition) meaningless. You would prevent any user confusion and/or antipathy to the PK by hiding it from them. This is common practice.

The other is to use the three fields as a composite PK. This has the advantage of the user being able to construct the PK value based on what they are looking for or drill down when they are browsing. The disadvantage, of course, is working with three fields instead of one. But if the contents of the three fields are values you will normally have available to you anyway, this is less of a disadvantage. But, and this is important, these three fields define a natural key whether or not you explicitly define them as the Primary Key. So even if you decide to go with a surrogate key, these fields must be defined as a key: each individual field defined NOT NULL and a unique index built using all three -- unique( form, size, tpi ).

This is not a comprehensive analysis. There are additional pros and cons on both sides, but they are minor, usually contingent on varying factors so may or may not apply to your circumstances and, besides, I'm not writing a book here.

good points, I'm actually already using Java Enums for thread "forms", didn't want to be to wordy — jeff, Jun 08 '15 at 14:39
Enums are a great idea. It prevents code from ever handling a value that isn't already defined. However, you cannot guarantee, with 100% confidence, that your code ever and always will be the only agent by which data in the database changes. It may even be true today, but you don't know about tomorrow. If for no other reason, do not neglect to add such common sense constraints to the database whenever and wherever possible. It is perfectly fine, I would even say it is an essential aspect of good design, to have redundant data integrity constraints in both the app(s) and in the database. — TommCatt, Jun 09 '15 at 15:37

Which column(s) for Primary Key(s)

2 Answers2