18

Should I always have a primary key in my database tables?

Let's take the SO tagging. You can see the tag in any revision, its likely to be in a tag_rev table with the postID and revision number. Would I need a PK for that?

Also since it is in a rev table and not currently use the tags should be a blob of tagIDs instead of multiple entries of multiple post_id tagid pair?

kch
  • 77,385
  • 46
  • 136
  • 148
  • 16
    I hope you realize your last sentence has been deemed incomprehensible by more than one person. – kch May 06 '09 at 22:54
  • Possible duplicate of [Should each and every table have a primary key?](https://stackoverflow.com/questions/840162/should-each-and-every-table-have-a-primary-key) – Vicente Olivert Riera Jul 10 '17 at 14:31

11 Answers11

11

A table should have a primary key so that you could identify each row uniquely with it.

Technically, you can have tables without a primary key, but you'll be breaking good database design rules.

Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • In a legacy application I maintain there are a few tables that have only one row each, by design. And some tables that will have no more than ten or so rows (lookups of one kind or another). There is utterly no sense of having a primary key in such a situation. At least in my opinion, anyway. – Cyberherbalist May 06 '09 at 21:33
  • 3
    @Cyberherbalist: Even in that case, if you are using that table programmatically, you might need a way to identify each row separately. Also note that having a primary key does not mean having a *separate* primary key *column*. You might have a combination of columns as primary key. BTW, it's perfectly valid to skip patterns and rules when you think they don't make sense in your specific situation but for vast majority of cases, not having a primary key hurts. – Mehrdad Afshari May 06 '09 at 21:39
  • Exactly. I thought WTF when I read our (UK) Secondary School's "Databases" booklet, for MS Access. Quote - "If Access asks you to define a primary key, click No." – Lucas Jones May 06 '09 at 21:43
  • @Mehrdad: I agree, generally. In the case I mention, for the small multi-row tables, there is always some column that serves as a de-facto key, even if it isn't defined as a PK. Programmatically accessing these lookups does not seem to impact the operation of the application. For the few single-row tables there is no point of a key of any kind. @person-b: I would have thought WTF as well! – Cyberherbalist May 06 '09 at 21:56
  • Cyberherbalist: What about cases where the small table is referenced from another table, wouldn't you want a primary key then? (to use as a foreign key). – LegendLength May 06 '09 at 22:59
  • @LegendLength, If the records in a column of the parent table are unique we do not compulsarily need a primary key – A_J Jan 04 '16 at 16:07
9

You should strive to have a primary key in any non-trivial table where you're likely to want to access (or update or delete) individual records by that key. Primary keys can consist of multiple columns, and formally speaking, will be the shortest available superkey; that is, the shortest available group of columns which, together, uniquely identify any row.

I don't know what the Stack Overflow database schema looks like (and from some of the things I've read on Jeff's blog, I don't want to), but in the situation you describe, it's entirely possible there is a primary key across the post identifier, revision number and tag value; certainly, that would be the shortest (and only) superkey available.

With regards to your second point, while it may be reasonable to argue in favour of aggregating values in archive tables, it does go against the principle that each row/column intersection in a table ought to contain one single value. While it may slightly simplify development, there is no reason you can't keep to a normalised table with versioned metadata, even for something as trivial as tags.

Rob
  • 47,999
  • 5
  • 74
  • 91
4

See this related question about whether an integer primary key is required. One of the answers uses tagging as an example:

Are there any good reasons to have a database table without an integer primary key

For more discussion of tagging and keys, see this question:

Id for tags in tag systems

Community
  • 1
  • 1
Pete TerMaat
  • 3,185
  • 25
  • 15
  • However, this is not the question. @acidzombie is not asking about a non-integer PK, but the general case of whether or not to have a primary key of any type (i.e. string). – Cyberherbalist May 06 '09 at 21:49
4

I tend to agree that most tables should have a primary key. I can only think of two times where it doesn't make sense to do it.

  1. If you have a table that relates keys to other keys. For example, to relate a user_id to an answer_id, that table wouldn't need a primary key.
  2. A logging table, whose only real purpose is to create an audit trail.

Basically, if you are writing a table that may ever need to be referenced in a foreign key relationship then a primary key is important, and if you can't be positive it won't be, then just add the PK. :)

James Black
  • 41,583
  • 10
  • 86
  • 166
  • 2
    On point 1, you should have a composite primary key (user_id,answer_id). On point 2 I would expect a timestamp column, but if it was possible to have two identical rows (identical timestamps and message) I would definitely want to know that something had happened twice and be able to deal with those events separately. "13:15-doubled overdraft" is very different from "533-13:15-doubled overdraft,534-13:15-doubled overdraft" for example. – Draemon Dec 11 '09 at 13:56
  • 1
    @Draemon - If you have two columns in a table, having a composite key is overkill, IMO, as you really don't get any benefit. If you can have two events happen at the same time then you have bigger problems, if you don't expect two updates at the same time on the same record. – James Black Dec 12 '09 at 04:38
3

From MySQL 5.5 Reference Manual section 13.1.17:

If you do not have a PRIMARY KEY and an application asks for the PRIMARY KEY in your tables, MySQL returns the first UNIQUE index that has no NULL columns as the PRIMARY KEY.

So, technically, the answer is no. However, as others have stated, in most cases it is quite useful.

e18r
  • 7,578
  • 4
  • 45
  • 40
  • What nobody mentioned was a primary key can be multiple columns which is what I do now –  Apr 15 '14 at 23:22
  • 1
    Rob did mention it. Perhaps you should mark [his answer](http://stackoverflow.com/a/831884/2430274) as accepted? – e18r Apr 16 '14 at 02:58
  • Hey he did... maybe I wanted to wait until I tried it and for forgot to accept it. Here is an upvote –  Apr 17 '14 at 05:22
1

I firmly believe every table should have a way to uniquely identify a record. For 99% of the tables, this is a primary key. For the rest you may get away with a unique index (I'm thinking one column look up type tables here). Any time I have a had to work with a table without a way to uniquely identify records, there has been trouble.

I also believe if you are using surrogate keys as your PK, you should, where at all possible, have a separate unique index on whatever combination of fields make up the natural key. I realize there are all too many times when you don't have a true natural key (names are not unique or what makes something unique might be spread across several parentchild tables), but if you do have one, please please please make sure it has a unique index or is created as the PK.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
0

If there is no PK, how will you update or delete a single row ? It would be impossible ! To be honest I have used a few times tables without PK, for instance to store activity logs, but even in this case it is advisable to have one because the timestamps could not be granular enough. Temporary tables is another example. But according to relational theory the PK is mandatory.

Lluis Martinez
  • 1,963
  • 8
  • 28
  • 42
  • 1
    Impossible? Nah. Rows in a small table can be identified uniquely without a PK. If it is a small lookup table (where I see the most use of PK-less tables), there is always one value that could be the key, but at any rate serves as a way to uniquely identify the row even without a PK (or you can edit the row by opening the table using SSMS). But it would seem to be more efficient to use a primay key and a Sql command. – Cyberherbalist May 06 '09 at 21:44
  • @Lluis, you use DELETE...LIMIT 1 or UPDATE...LIMIT 1. @Cyberherbalist, rows cannot always be identified uniquely since there is nothing stopping all column values being the same, but you can limit the number affected by a query. – Draemon Dec 11 '09 at 14:40
0

it is good to have keys and relationships . Helps a lot. however if your app is good enough to handle the relationships then you could possibly skip the keys ( although i recommend that you have them )

user55474
  • 537
  • 1
  • 8
  • 25
0

Since I use Subsonic, I always create a primary key for all of my tables. Many DB Abstraction libraries require a primary key to work.

Note: that doesn't answer the "Grand Unified Theory" tone of your question, but I'm just saying that in practice, sometimes you MUST make a primary key for every table.

Matt Dawdy
  • 19,247
  • 18
  • 66
  • 91
0

If it's a join table then I wouldn't say that you need a primary key. Suppose, for example, that you have tables PERSONS, SICKPEOPLE, and ILLNESSES. The ILLNESSES table has things like flu, cold, etc., each with a primary key. PERSONS has the usual stuff about people, each also with a primary key. The SICKPEOPLE table only has people in it who are sick, and it has two columns, PERSONID and ILLNESSID, foreign keys back to their respective tables, and no primary key. The PERSONS and ILLNESSES tables contain entities and entities get primary keys. The entries in the SICKPEOPLE table aren't entities and don't get primary keys.

lumpynose
  • 967
  • 4
  • 12
  • I'd probably put a primary key (or at least a unique index) on the SICKPEOPLE table (including both columns) to make sure that you don't accidentally insert the same row twice. – Graeme Perrow May 07 '09 at 01:12
  • Good point. As you say, you could avoid the problem of duplicate inserts by adding a unique constraint or index on the combination of the PERSONID and ILLNESSID in the SICKPEOPLE table. At the momen I tend to favor not adding primary keys to tables like this because it helps make clear that it's not an entity. – lumpynose May 08 '09 at 21:05
-1

Databases don't have keys, per se, but their constituent tables might. I assume you mean that, but just in case...

Anyway, tables with a large number of rows should absolutely have primary keys; tables with only a few rows don't need them, necessarily, though they don't hurt. It depends upon the usage and the size of the table. Purists will put primary keys in every table. This is not wrong; and neither is omitting PKs in small tables.

Edited to add a link to my blog entry on this question, in which I discuss a case in which database administration staff did not consider it necessary to include a primary key in a particular table. I think this illustrates my point adequately.

Cyberherbalist's Blog Post on Primary Keys

Cyberherbalist
  • 12,061
  • 17
  • 83
  • 121
  • A single row table doesn't need a primary key, anything else should have one defined to avoid duplicates. – jmoreno Jun 21 '12 at 00:57
  • 1
    Yes, generally, but business rules determine whether duplicates are to be permitted -- it is not inconceivable that duplicate entries in a table might not only be permissible, but expected. It depends upon what is being stored, and what use is made of it. BTW thanks for the rep hit -- this answer doesn't actually contradict the accepted answer. Your absolutism is noted. – Cyberherbalist Jun 21 '12 at 17:15
  • If you're storing exact duplicates, you're storing the wrong thing. As for the rep hit, remove the slam at people that think that every table should have a PK, and I'll remove it. – jmoreno Jun 21 '12 at 18:28
  • No "slam" intended. Were you personally offended by something I wrote? If so, I apologize, but as a matter of purism I will not modify my answer except to add a link to my blog post about this matter. Keep your downvote. I will not cave in to intimidation. – Cyberherbalist Jun 25 '12 at 22:13
  • Upvoted. And not offended by being called a "purist"... but keyless tables that just happen to have unique values by usage, are confusing at best and dangerous at worst (especially when code treats these tables as if they *did* have keys). Having a key can't hurt, *not* having a key can, or should say, *will* - one day or another, Murphy's law states. – Mathieu Guindon Mar 13 '13 at 21:02
  • Of course having a key can't hurt. In the example I provided, our very competent DBA folks didn't provide a key to the table because it would only ever be inserted to, and rows would never be updated or deleted. I'd bet real money that in this particular case not having a key will never cause a problem. Ever. :-) In six years of system operation there has never been a problem with this table, or the code that services it, and there are only 96 rows after all that time. Curious if you read my blog post? – Cyberherbalist Mar 13 '13 at 22:58