13

What unique functionality do Primary Keys provide?

While i titled the question with tongue firmly planted in cheek, my question is serious. Before any flames start, I'm not saying build a database without constraints or referential integrity. As far I can tell, however, SQL Server could do away with the primary key key word.

  • Unique indexes cover, well, uniqueness
  • Column based Non-nullability covers the non-nullability requirement for PKs
  • PK's don't have to be clustered, so that's not it
  • Foreign keys can, and often are, implemented with unique indexes, rather than PKs
  • Even MSDN states that a unique index is created to enforce the PK's uniqueness

I do agree that logically a Primary Key coveys a bit of intention about a data model, but is that it? [sarcasm]Oh, and we do get that little Key icon SSMS shows when designing a table! [/sarcasm]


EDIT

From the comments, it seems clear I didn't ask this question as clearly as I thought. I agree that primary keys are important from a logical perspective.

I'm not asking:

  • should i choose an int or a varchar for my PK
  • do PK's have to be clustered, or how do i identify what should be clustered
  • how do i uniquely identify rows

My intention was to ask "what features do PK's provide that cannot reasonably be implemented using other features?" I'm not suggesting going crazy here -- like using a trigger to enforce uniqueness instead of unique constraints/indexes. Reasonable is a key word here -- and using a unique index/constraint seems very similar to defining a PK.

nvogel
  • 24,981
  • 1
  • 44
  • 82
EBarr
  • 11,826
  • 7
  • 63
  • 85
  • 1
    Maybe this is more suited for http://programmers.stackexchange.com/? – praseodym Feb 29 '12 at 18:50
  • 3
    This is better for dba.stackexchange.com, not programmers, I think. – Aaron Bertrand Feb 29 '12 at 18:51
  • 1
    Unique indexes(constraints) are often used to enforce uniqueness at a column level, whereas a primary key is meant to define uniqueness at the row level throughout the table - it uniquely identifies that collection ('tuple') of data. Ideally, there is one primary key, but you can have serveral unique indexes. Unique indexes can contain null, primary key's can't. See http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:4941517587762 for example, although the implementation in SQL Server isn't perfect: http://weblogs.sqlteam.com/mladenp/archive/2007/05/17/60209.aspx – dash Feb 29 '12 at 18:51
  • 3
    I think you meant unique constraints, not unique indexes. While under the covers in SQL Server they are fundamentally the same, the logical difference is significant. – Aaron Bertrand Feb 29 '12 at 18:52
  • It could fit on either of the other sites. Although, I'm asking from a developers perspective & it also seems to fit the 3rd and 4th bullet point in the FAQ for this site. – EBarr Feb 29 '12 at 18:55
  • @AaronBertrand - MSDN states that "The Database Engine automatically creates a unique index". http://msdn.microsoft.com/en-us/library/ms181043.aspx I do agree with the *logical* difference. I'm just wondering if there is any *functionality* I'm missing. – EBarr Feb 29 '12 at 18:57
  • @AaronBertrand - I actually asked this question after reading one of your posts (https://sqlblog.org/2012/02/27/bad-habits-to-kick-believing-everything-you-hear-or-read) and thinking ...I've had that gripe so many times before. – EBarr Feb 29 '12 at 18:59
  • Well in my opinion primary keys are still extremely useful. At least as much for self-documentation of the schema as to enforce uniqueness. Certainly not for the little key icon in SSMS, but even that can be useful for people who need to use the table designer to understand a table. You can have multiple unique constraints or unique indexes, but only one *should* be the way that you typically expect to identify a row (whether it be for straight lookups or to join to other tables). – Aaron Bertrand Feb 29 '12 at 19:06
  • 2
    Primary keys are **less** flexible in SQL Server than simply creating a unique index on non nullable key columns in that you can't add included columns to the index that enforces it (though of course in the case that the PK is also the CI this makes no odds) – Martin Smith Feb 29 '12 at 19:29
  • Just because you can duplicate one feature through a combination of other features does not mean you should. – cadrell0 Feb 29 '12 at 19:33
  • @MartinSmith - i was hoping you were going to weigh in :-) I hadn't thought about PK flexibility that way...good point. Very similar to being allowed nullability in unique indexes, but not in a PK. – EBarr Feb 29 '12 at 19:33

6 Answers6

10

A completely different perspective :

SQL is a language that is defined by an ISO standard. That standard has "mandatory" features and "optional conformance" features.

If you build a DBMS with some data manipulation language, then you are entitled to call your language "SQL" only if :

(a) you have implemented ALL of the syntax prescribed by the standard ("mandatory" features) , and (b) all of the language features that you have implemented (all the mandatory ones as a minimum, but also the "optional" ones you "opted in" for), expose exactly the behaviour as defined/Described in the standard.

The "PRIMARY KEY" syntax is a very old feature, and it's not unlikely that it is one of those "mandatory" ones. Ditching the word from your language means you can no longer legitimately call your language SQL. Big commercial vendors are not likely going to make such a move any time soon.

Erwin Smout
  • 18,113
  • 4
  • 33
  • 52
9

The idea of designating one key per table as a "primary" one is essentially superfluous, outdated and in many ways very unhelpful.

It is superfluous because logically speaking all keys can and do serve the same function. Leaving aside the limitations of any particular DBMS, logically speaking a "primary" key enjoys exactly the same features and functionality as any other key of the same table. The designation of one key as "primary" is therefore only as important as the database designer or user wants it to be. The distinction is arbitrary (that's the word used by E.F.Codd) and purely psychological (C.J.Date).

The concept is outdated because in modern practice it is commonplace for tables to have more than one key and for different users and consumers of data to have different "preferred" or "most significant" identifiers for the same piece of data. E.g.: an end user may recognise and use one key of a table (often the one called a "business" or "natural" key); a middle-tier programmer will possibly be more interested in a different key in the same table (e.g. a "surrogate" key); the DBA on the other hand may view the "clustered" key as the most important or maybe he is equally concerned with all keys that have indexes. So the preferred or most important key depends on the point-of-view and the intended usage - it is not a rigid structural feature at all.

The "primary key" concept is unhelpful for at least two reasons. Firstly, software vendors of database development tools, DBMSs and modelling tools have unfortunately attached all sorts of software features to the keys designated as "primary key". This actually works against the original concept. No longer do we just need to select one key per table that has some logical significance for the designer or user. We are encouraged or even compelled to choose "primary" keys to support this or that feature in X,Y or Z piece of software, regardless of other considerations. This is very regrettable because it represents a limitation and a lack of flexibility in software. We ought to be free to choose an appropriate key for each purpose and not be restricted to just one key per table for every purpose.

The final reason that primary keys are unhelpful is that they are a needless distraction from more important issues of database design. The primary key concept is given often vastly exaggerated significance in education, in textbooks on database design and in everyday data management practice. This is frequently to the detriment or actual exclusion of the more fundamental issue, i.e. that all of the keys and all of the other integrity constraints can be just as important to successful database design and implementation.

I have often argued that the term "primary key" ought to be deprecated and dropped from data management vocabulary as well as from data management software.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • Good answer from a theoretical perspective. I agree with most of it. Re: SQL Server where specifically are we "encouraged or even compelled to choose" PK's? – EBarr Mar 01 '12 at 02:55
  • 1
    @Ebarr, SQL Server transactional replication mandates the use of a PRIMARY KEY constraint. The lousy tool that Management Studio calls a "database diagram" only shows keys defined as PRIMARY KEY (unfortunately it isn't the only ERD-type tool that suffers that limitation). In the Oracle world, the index-organized table feature is tied to the PRIMARY KEY constraint syntax. – nvogel Mar 01 '12 at 19:20
7

Primary key is a logical concept. It is the key that defines the entity identity: in a table of Widgets, each individual Widget is distinguished by its primary key value. PK is not the clustered index (that is a physical storage property), nor an unique constraint (that is a different logical property). While often the primary key and the clustered key overlap, that is just a coincidence (PK is a convenient clustered key) or even just a negligence (PK is used as a clustered key even though better candidates exist for the given workload).

Changing the clustered key is a change that can be done at any moment, on the field, by ops, to better fit this storage requirement or that performance workload requirement. The app should not notice the change (in an ideal world...). Changing the PK is a design change that require modifying the data model of the app as the identifier of the object changes, and it usually percolates through the data mode/app code.

BTW, this topic has been asked and answered ad-nauseam already here:

To elaborate a bit on the difference between PK and UNIQUE constraints: even if there are several properties that have unique constraints and therefore could serve as a PK, only one will be the right choice, they are not equal. Which one is entirely up to the data model, to the business meaning the entity and what each property represents. The PK is not important for the DBMS, the DBMS really cares about clustered key and uniqueness a whole lot more than about PK. The PK is for you, the developer, and for your toolset. You don't want each developer pointing to the database with it's ORM tool to pick a different unique key as the entity identity, and then each one write code that stores a different property as the identity. You want all to pick the same one, the primary key, because that has other attributes in addition to being unique. A prime example is Stability, the PK value is stable for the entire lifetime of the entity (if is not, then the PK was not chosen correctly).

what features do PK's provide that cannot reasonably be implemented using other features?

The little SSMS icon. Seriously, ultimately this is what it boils down to: the PK conveys the extra information which of the possible keys is actually the one that identifies the entities in the table. Path Dependence does play a significant role in today's PK position, agreed, but if not for that some other construct would rise up to fulfill exactly this role of conveying intent about the logical model.

Community
  • 1
  • 1
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • +1 - I think SQL Server's designers made the choice that you are often going to be retrieving an object by it's "uniqueness", hence the default choice of clustered index. – dash Feb 29 '12 at 18:57
  • I guess I didn't ask my question as clearly as I though. I wasn't trying to ask : (a) what column(s) should get my clustered index, (b) the effect of a surrogate key, (c) should I always have a PK, or (d)how a varchar will affect index compactness. My question is what features do PK's provide that are not available by some other mechanism in SQL Server. I spent some time searching and have not found a question on SO that addresses this aspect. – EBarr Feb 29 '12 at 19:15
  • Despite the titles of the questions in the links, the answers linked actually address your question. – Remus Rusanu Feb 29 '12 at 19:18
  • 1
    Sorry to see you say that "primary key" is "a different logical property" than "a uniqueness constraint". No it isn't. Yes they are the same property. Both declare and make the DBMS enforce the uniqueness -throughout the whole table- of combinations of values in certain attributes. Both also do nothing more than just that. The notion of "primary" is not logical, but psycho logical. A primary key is not somehow "more unique" than a "secondary" one. And "secondary" keys are an equally valid means of identification as a "primary" one, precisely because they are "equally" unique. – Erwin Smout Feb 29 '12 at 19:23
  • @Erwin: There are many points on which I disagree, and they are better explained in some of the linked posts. But at the very least you should consider that there are differences between unique constraints and PKs, eg. NULL being allowed in UNIQUE constraint vs. not allowed in PK (consider *why* this difference exists, is not an accident). – Remus Rusanu Feb 29 '12 at 19:30
  • 3
    @Remus Unique constraints that allow nulls, while are ISO/ANSI compliant, are an abomination. – Conrad Frix Feb 29 '12 at 20:20
  • Absolutely. Two rows have the same value for column A, say '1'. Those same two rows have no value (are null) for column B. There should be some value, but in either case we don't know which one it ought to be. How can anyone or anything conclusively decide that those two rows are in violation of a unique constraint on those two columns (the truth of the matter is you just don't know), OR conclusively decide that those two rows are NOT in violation of such a uniqueness constraint (the truth of the matter is still that you just don't know). – Erwin Smout Feb 29 '12 at 20:29
  • @Remus - the linked questions primarily discuss implementation choices for things like clustered and non-clustered index. Not quite my question, but good reading anyway. Thanks. – EBarr Feb 29 '12 at 21:47
  • 3
    @Remus, I don't understand how you can justify the idea that only one key per table must be the "right choice". It's very common for things to have multiple identifiers and which one is "right" for one purpose may be "wrong" for another use of the same data. Also, stability is not and has never been an absolute requirement for a primary key. Stability is usually desirable but is anyway a relative and subjective quality and in practice is not always verifiable or possible. Key values can and do change. – nvogel Mar 01 '12 at 02:21
  • I would like to hear the defenders of primary key mutability explain how should application developers handle optimistic concurrency in the presence primary key changes. – Remus Rusanu Mar 02 '12 at 07:16
  • 1
    @Remus, There are many different ways an application can handle concurrency. This space isn't sufficient to cover the details. One possible response to a key value changed in another session under optimistic concurrency is to disallow the later change or to prompt the user to confirm the change (because either the attributes for the row they were editing have changed or the row has disappeared altogether). In any case that is not the topic of this thread. Key change happens. Calling a key "primary" doesn't make the slightest bit of difference to that fact. – nvogel Mar 02 '12 at 13:55
4

Theoretically, all keys are equivalent, but we choose one of them as "primary" for psychological and practical reasons. Some considerations:

  • The PK fields are automatically NOT NULL. UNIQUE fields are NOT NULL only if you specify them as such (BTW, NULLs in the UNIQUE constraint are often treated differently by different DBMSes).
  • FOREIGN KEY syntax defaults to parent's PK. If you want to use parent's UNIQUE constraint, you need to specify it explicitly.
  • Clustering is typically based on PK.
  • PK is often displayed in a visually distinct way by ER tools. This can document that (psychologically) we consider one key more "important" than others.

A lot of this is tradition - we could just as easily have the conventions and tools that make all the keys equivalent not just in theory but in practice as well, but history can be a powerful force, even in a relatively young industry such as ours.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • Well said--tradition. I knew what PKs are _useful_ for, but honestly couldn't think of anything PK's were required for. @sqlvogel pointed out one requirement - replication. and [@]martin_smith pointed out one restriction - can't include columns. – EBarr Mar 21 '12 at 12:19
  • @EBarr Yes, all these uses are _incidental_ to what PK really is at the logical level. We could just as easily imagine replication or clustering using one of the alternate keys, but it seems convenient to "bundle" all these uses on top of the poor old PK. – Branko Dimitrijevic Mar 21 '12 at 12:43
  • @EBarr BTW, columns _can_ be added to PK, provided the table is **empty** (which makes sense - the new PK column must be NOT NULL, but which value to write to the existing rows??). The same is true for adding a NOT NULL column to UNIQUE constraint. If I'm not mistaken, this behavior should be universal for all DBMSes, not just MS SQL Server. – Branko Dimitrijevic Mar 21 '12 at 12:43
  • I totally agree on the logical importance and recognize that implementations could easily differ. This questions seems to have generated the discussion i was hoping for. While there is tons of discussions here on *how* and *why* to chose keys & the implications there of. But I couldn't find anything on *when must I* have a PK. Knowing the hard rules lets us know when we can deviate (even if normal circumstances would counsel otherwise). – EBarr Mar 21 '12 at 13:15
3

You point out that

logically a Primary Key coveys a bit of intention

and Aaron Betrand points out in the comments

You can have multiple unique constraints or unique indexes, but only one should be the way that you typically expect to identify a row

I'm guessing that Aaron used words like should and typically because he knows that even foreign key contraints only require a unique contraint

From MSDN docs on SQL FOREIGN KEY Constraints

A FOREIGN KEY constraint does not have to be linked only to a PRIMARY KEY constraint in another table; it can also be defined to reference the columns of a UNIQUE constraint in another table

Furthermore C.J. Date also notes in An introduction to database systems

If the set of candidate keys actually does include more than one member, then the choice of which is to be primary is essentially arbitrary *

This leads me to conclude that Primary Keys don't indeed provide much except for convention. But it is one that is so heavily integrated into the tools we use and the mental models of most people that it can't be ignored.

*C.J. Date does explain here that choosing a Primary Key isn't completely arbitrary. For example a volatile primary key would be a bad idea.

Conrad Frix
  • 51,984
  • 12
  • 96
  • 155
  • 1
    That link under "knows that" links to a page treating FOREIGN keys ... How does that relate to the question ? – Erwin Smout Feb 29 '12 at 19:27
  • +1 - Thanks for addressing my question -- it seems like you see it much like me --- we do it because it's ingrained, because it coveys logical meaning, much like we talk about design patterns b/c they summarize a concept. – EBarr Feb 29 '12 at 20:06
0

Primary keys are automatically indexed in all DB systems. (At least in these that I know so far.)

sjas
  • 18,644
  • 14
  • 87
  • 92