194

Here we go again, the old argument still arises...

Would we better have a business key as a primary key, or would we rather have a surrogate id (i.e. an SQL Server identity) with a unique constraint on the business key field?

Please, provide examples or proof to support your theory.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Manrico Corazzi
  • 11,299
  • 10
  • 48
  • 62
  • 2
    @Manrico: you just have to ask yourself this: if I don't use a surrogate key, will my primary key still be immutable? If the answer is no, then you should **seriously** consider using a surrogate key. Also, if the primary key is composed even partially from user inputs, you should consider using a surrogate key. Why? Because of the danger of data anomalies. – code4life May 19 '16 at 00:42
  • 1
    @TylerRick But this isn't a perfectly good question. It asks for a solution that is generally applicable to all situations, when clearly there is not one, as proven by the "religious war" that the asker is perfectly aware of (quote: "Here we go again, the old argument still arises..."). Instead of wondering if the world has changed and finally a compelling reason to choose one side all times has been provided, it is better to keep asking this question over and over again for each concrete situation, and post to SO when you aren't sure. This just elicits dogmatism. – MarioDS Aug 19 '16 at 15:33

19 Answers19

136

Just a few reasons for using surrogate keys:

  1. Stability: Changing a key because of a business or natural need will negatively affect related tables. Surrogate keys rarely, if ever, need to be changed because there is no meaning tied to the value.

  2. Convention: Allows you to have a standardized Primary Key column naming convention rather than having to think about how to join tables with various names for their PKs.

  3. Speed: Depending on the PK value and type, a surrogate key of an integer may be smaller, faster to index and search.

Jay Shepherd
  • 1,958
  • 1
  • 12
  • 6
  • 2
    Now, after reading a lot about surrogate keys and natural keys, I think using surrogate keys is better. But, on my database, natural keys (a NVARCHAR(20)) must be unique. I don't understand how I can get more speed if I need to check every data on that column to don't repeat any value (using a NOT NULL UNIQUE constraint) on each insert. – VansFannel May 18 '16 at 07:39
  • @VansFannel, as I know, index, created for ensuring uniqueness, will take care of checking for repetitions whenever you insert/update value. – Ziyaddin Sadigov Sep 11 '20 at 17:27
  • 1
    how to maintain integrity on weak entity, if all tables using surrogate key? – Yosua Lijanto Binar Aug 03 '22 at 06:31
  • "Speed" -- When you unnecessarily need to go through a surrogate, performance suffers. An example is a many-to-many mapping table without a surrogate: `PRIMARY KEY(a,b), INDEX(b,a)` – Rick James Aug 01 '23 at 18:30
114

Both. Have your cake and eat it.

Remember there is nothing special about a primary key, except that it is labelled as such. It is nothing more than a NOT NULL UNIQUE constraint, and a table can have more than one.

If you use a surrogate key, you still want a business key to ensure uniqueness according to the business rules.

Ted
  • 1,780
  • 1
  • 11
  • 5
  • 8
    If you have multiple "candidate" keys (fields or same-size collections of fields that are NOT NULL UNIQUE) then you are likely in violation of Boyce-Codd Normal Form. BCNF is beyond 3NF, so not many people worry about it. There are situations, however, where being in BCNF is very helpful. – Alan Sep 17 '08 at 17:19
  • A surrogate key is extremely useful for dealing with *single*-column relations and for applications which need to deal with related tables. A surrogate key in a common format is, again, useful for such things. But of course this doesn't mean eliminate business constraints. – yfeldblum Feb 12 '09 at 15:00
  • 2
    Agreed. The real question should be: Should I add a unique surrogate key to my tables? An entirely other question is what to use for a logical primary key. They are both essentially just non-null unique index constraints. – dkretz Feb 12 '09 at 22:11
  • 2
    "Every problem is solved with another level of indirection"... Surrogate keys are just that : *another* indirection level – Steve Schnepp May 29 '09 at 11:10
  • 6
    I find it odd that many comments seem to assert that one cannot setup a relationship without a surrogate key. In many cases, the surrogate key is superfluous. Why add something that brings no value but adds technical debt (and in some cases, causes an otherwise unique result to suddenly become non-unique). – Wil Moore III Sep 28 '10 at 06:55
  • 2
    Indeed. Not generalizing, but most people these days usually tend to dumb-down options into a black-and-white type of argument (THE non plus ultra of False Dilemmas); and almost universally, the CORRECT answer is this one: "BOTH. Have your cake and eat it too". In the search of economy for the sake of it we tend to throw away ideas and keep just one to Rule Them All. Both keys have their reason to exist, and most business models today end up using both at the same time. – alejandrob Oct 05 '15 at 14:21
  • 3
    It is more than NOT NULL UNIQUE constraint. The primary key is used as a clustered index which determines the physical order of your data. In general, Integer is easy to balance since it increments sequentially and your data will append to the EOF on disk. If you use less sequential data such as text or GUID(UUID), there will be a lot more disk IO and effort to balance the index, I think that's kind of big difference – Jin May 02 '16 at 20:42
84

It appears that no one has yet said anything in support of non-surrogate (I hesitate to say "natural") keys. So here goes...

A disadvantage of surrogate keys is that they are meaningless (cited as an advantage by some, but...). This sometimes forces you to join a lot more tables into your query than should really be necessary. Compare:

select sum(t.hours)
from timesheets t
where t.dept_code = 'HR'
and t.status = 'VALID'
and t.project_code = 'MYPROJECT'
and t.task = 'BUILD';

against:

select sum(t.hours)
from timesheets t
     join departents d on d.dept_id = t.dept_id
     join timesheet_statuses s on s.status_id = t.status_id
     join projects p on p.project_id = t.project_id
     join tasks k on k.task_id = t.task_id
where d.dept_code = 'HR'
and s.status = 'VALID'
and p.project_code = 'MYPROJECT'
and k.task_code = 'BUILD';

Unless anyone seriously thinks the following is a good idea?:

select sum(t.hours)
from timesheets t
where t.dept_id = 34394
and t.status_id = 89    
and t.project_id = 1253
and t.task_id = 77;

"But" someone will say, "what happens when the code for MYPROJECT or VALID or HR changes?" To which my answer would be: "why would you need to change it?" These aren't "natural" keys in the sense that some outside body is going to legislate that henceforth 'VALID' should be re-coded as 'GOOD'. Only a small percentage of "natural" keys really fall into that category - SSN and Zip code being the usual examples. I would definitely use a meaningless numeric key for tables like Person, Address - but not for everything, which for some reason most people here seem to advocate.

See also: my answer to another question

Community
  • 1
  • 1
Tony Andrews
  • 129,880
  • 21
  • 220
  • 259
  • 17
    -1 Natural keys as a primary key have the problem that for every child table you have to add the parent's key which can be composed by more than one field (instead of only one which is the case of a surrogate key) and also the child key. So imagine the following where starting from TABLEA the relationship is 1-0..*: TABLEA PK: ID_A TABLEB PK: ID_A ID_B TABLEC PK: ID_A ID_B ID_C TABLED PK: ID_A ID_B ID_C ID_D. See the problem? The parent key is propagated in the children tables. What would happen if the primary key of TABLEA changes? Now you would have to refactor all the child tables PK too. – Alfredo Osorio Feb 15 '12 at 23:28
  • 12
    @Alfredo: yes of course there is a trade-off. However, in my 20+ years of experience I have rarely seen the definition of a table's PK change. If it happened on a regular basis I'd probably avoid natural keys too. In reality, on the extremely rare occasions that this happens I'm prepared to take the hit of the extended impact. – Tony Andrews Feb 16 '12 at 10:18
  • This is a very reasonable answer. For example, I'm currently trying to design a state-machine schema, and I have the option of going with either a `UNIQUEIDENTIFIER` or a simple `VARCHAR`. In the end, which one is more readable? `SELECT ... FROM dbo.StateMachine WHERE id = '21556f00-9896-4455-ba26-cadea386d3cd'`, or `... WHERE id = 'registration'`? Even if you call them "natural keys", a lot of them end up being technical identification keys that just happen to be convenient. – voithos Sep 21 '12 at 16:40
  • 12
    I disagree. It is often the case where some outside body (the customer) legislates that a natural key needs to be edited, and therefore propagated throughout the system. I see this happen regularly. The only way you can be sure that the key won't ever need to change is when it is by definition meaningless. Furthermore, modern databases handle inner joins extremely efficiently, so the potentially large space gains from using surrogates typically outweighs the advantage of not having to do as many inner joins. – TTT Dec 17 '12 at 22:53
  • 13
    @TTT: Then the design was weak to begin with. Again, that's where the men separate from the boys: making the right choice of when to use the natural key, and when to use a surrogate. You decide that on a per-table basis, not as a general dogma. – DanMan Oct 17 '13 at 17:25
  • 2
    @DanMan: I would estimate that all of the men out there did in fact make the right choice when to use natural keys, say, 99.9% of the time. It's the 1/1000 times where the men made the wrong choice that the boys have to go in and clean up the mess. – TTT Oct 25 '13 at 18:18
  • 4
    "why would you need to change it?" because the PHB who doesn't know what they are talking about says that HR has to be changed to Human Resources. – Shane Jan 14 '15 at 15:11
  • 12
    I have also 20+ years of experience, and I second your opinion. I have once a created an oracle datawarehouse with surrogate keys, and data maintenance was like hell. You simply can never directly access your data. you always need to write queries for everything, and that makes surrogate keys simply awful to handle. – SQL Police Feb 19 '16 at 22:58
  • 2
    And many-to-many mapping tables run much faster with a 2-column "natural" key. – Rick James Aug 01 '23 at 18:33
35

I hate surrogate keys in general. They should only be used when there is no quality natural key available. It is rather absurd when you think about it, to think that adding meaningless data to your table could make things better.

Here are my reasons:

  1. When using natural keys, tables are clustered in the way that they are most often searched thus making queries faster.

  2. When using surrogate keys you must add unique indexes on logical key columns. You still need to prevent logical duplicate data. For example, you can’t allow two Organizations with the same name in your Organization table even though the pk is a surrogate id column.

  3. When surrogate keys are used as the primary key it is much less clear what the natural primary keys are. When developing you want to know what set of columns make the table unique.

  4. In one to many relationship chains, the logical key chains. So for example, Organizations have many Accounts and Accounts have many Invoices. So the logical-key of Organization is OrgName. The logical-key of Accounts is OrgName, AccountID. The logical-key of Invoice is OrgName, AccountID, InvoiceNumber.

    When surrogate keys are used, the key chains are truncated by only having a foreign key to the immediate parent. For example, the Invoice table does not have an OrgName column. It only has a column for the AccountID. If you want to search for invoices for a given organization, then you will need to join the Organization, Account, and Invoice tables. If you use logical keys, then you could Query the Organization table directly.

  5. Storing surrogate key values of lookup tables causes tables to be filled with meaningless integers. To view the data, complex views must be created that join to all of the lookup tables. A lookup table is meant to hold a set of acceptable values for a column. It should not be codified by storing an integer surrogate key instead. There is nothing in the normalization rules that suggest that you should store a surrogate integer instead of the value itself.

  6. I have three different database books. Not one of them shows using surrogate keys.

shA.t
  • 16,580
  • 5
  • 54
  • 111
Ken
  • 1,529
  • 12
  • 12
  • 8
    I hate surrogate keys, except when they are necessary. They are necessary when the enterprise uses a natural key that is subject to a lot of errors, and are unwilling to tolerate a database that's impacted by those errors. – Walter Mitty Sep 18 '10 at 11:06
  • 29
    -1: I have written and maintained dozens of applications. The ones with the most data-related problems were those using natural keys. – Falcon Apr 06 '11 at 13:58
  • #6 is actually a pretty compelling argument. Though I may be biased because I dislike surrogate keys : ) – James King Oct 25 '11 at 17:01
  • 2
    http://www.google.pl/search?q=surrogate+keys&tbm=bks&tbo=1 ;) – Robert Bak Feb 07 '12 at 19:14
  • 3
    Adding to 3.: in other words, you instantly know which are the mandatory fields in a user form to be able to create another row in a table. – DanMan Oct 17 '13 at 17:34
  • 6
    Some of your points assume the surrogate key has to be the PK or has to be the clustered column--not true. Your points 1 and 5 ignore the fact that integers are 4 bytes and natural keys are almost always many, many more bytes. And, each nonclustered index must repeat the bytes of those natural keys that are in the clustered index, so the tables and indexes in your natural key database are going to have far, far fewer rows per page, which translates in to *much worse* read performance, which makes queries *slower*, not faster. – ErikE Oct 26 '13 at 02:05
  • 3
    Another reason against natural keys (examples: Atomic Numbers, VINs, etc), the business logic can change that increases the type of data. For example - Before: Tracking charges of Atoms, After: Tracking charges of Atoms and Compounds. Before: Tracking Motor Vehicles for load capacity. After: Adding planes, boats, bikes and people for load carrying capacity. – forforf Dec 18 '13 at 21:15
  • 3
    I guess you don't have any tables where the primary key is composed even partially from 1) any attribute that can and will change), or 2) from user input (e.g. dynamically generated lookup lists). If you can't guarantee key immutability, then you will have to update all these entity relationships by code or by manual "fix" scripts. If you never had to do that... I guess your database is both surrogate key-less and... unusual. – code4life May 19 '16 at 00:46
  • 2
    @Falcon, can you tell more about "[applications] with the most data-related problems were those using natural keys"? For example, why were there data problems? What kind of problems? How would surrogate key use solve those problems? – Dennis Apr 19 '17 at 14:31
  • 1
    @Dennis - For me there have been many cases where I have had to change one of the NK values due to error or business need. Had I used the pattern of using the NK as the PK (no surrogate), I would have to delete and insert the row in question. That in itself doesn't sound terrible, except when that is the FK of another table, The handling of that key cascade (and getting it right) can become a nightmare, quickly. With a surrogate key, I need only issue a single update. – StingyJack Jan 11 '19 at 14:30
  • Reason #6 is a logical fallacy. Just because its not in a book doesn't make it wrong or untrue. Following that logic leads to every idea that was not written about in a book before it was conceived or observed elsewhere as being wrong or untrue. – StingyJack Jan 11 '19 at 14:34
  • 1
    Proper definition of Natural Keys is the Key :-) Indexes on surrogate keys might help you to make a JOIN faster, but will be useless for a WHERE statement.... or how often do you call your kids by numbers???? – Mihail Gershkovich May 03 '21 at 14:17
35

Surrogate key will NEVER have a reason to change. I cannot say the same about the natural keys. Last names, emails, ISBN nubmers - they all can change one day.

Rimantas
  • 1,471
  • 9
  • 6
  • Goal is not to resist change at the cost of correctness, goal is to manage change. One has to choose the natural keys with careful consideration & if it turns out it needs to be changed later on to include additional keys then that's where database migration process kicks in. Point is this is not something that is a simple choice of one over the other. Natural keys are the first choice & only in [specific cases](https://stackoverflow.com/a/12116010/5200064) does one choose surrogate keys. – lmk Aug 14 '23 at 23:04
35

Surrogate keys (typically integers) have the added-value of making your table relations faster, and more economic in storage and update speed (even better, foreign keys do not need to be updated when using surrogate keys, in contrast with business key fields, that do change now and then).

A table's primary key should be used for identifying uniquely the row, mainly for join purposes. Think a Persons table: names can change, and they're not guaranteed unique.

Think Companies: you're a happy Merkin company doing business with other companies in Merkia. You are clever enough not to use the company name as the primary key, so you use Merkia's government's unique company ID in its entirety of 10 alphanumeric characters. Then Merkia changes the company IDs because they thought it would be a good idea. It's ok, you use your db engine's cascaded updates feature, for a change that shouldn't involve you in the first place. Later on, your business expands, and now you work with a company in Freedonia. Freedonian company id are up to 16 characters. You need to enlarge the company id primary key (also the foreign key fields in Orders, Issues, MoneyTransfers etc), adding a Country field in the primary key (also in the foreign keys). Ouch! Civil war in Freedonia, it's split in three countries. The country name of your associate should be changed to the new one; cascaded updates to the rescue. BTW, what's your primary key? (Country, CompanyID) or (CompanyID, Country)? The latter helps joins, the former avoids another index (or perhaps many, should you want your Orders grouped by country too).

All these are not proof, but an indication that a surrogate key to uniquely identify a row for all uses, including join operations, is preferable to a business key.

tzot
  • 92,761
  • 29
  • 141
  • 204
  • 1
    Yup, surrogate keys are a disease. One leaks into the wild and you use it as a pkey so now you need your own surrogate key. Then your key leaks into the wild (say through a url) and the disease spreads. – Samuel Danielson Oct 05 '12 at 15:16
23

I want to share my experience with you on this endless war :D on natural vs surrogate key dilemma. I think that both surrogate keys (artificial auto-generated ones) and natural keys (composed of column(s) with domain meaning) have pros and cons. So depending on your situation, it might be more relevant to choose one method or the other.

As it seems that many people present surrogate keys as the almost perfect solution and natural keys as the plague, I will focus on the other point of view's arguments:

Disadvantages of surrogate keys

Surrogate keys are:

  1. Source of performance problems:
    • They are usually implemented using auto-incremented columns which mean:
      • A round-trip to the database each time you want to get a new Id (I know that this can be improved using caching or [seq]hilo alike algorithms but still those methods have their own drawbacks).
      • If one-day you need to move your data from one schema to another (It happens quite regularly in my company at least) then you might encounter Id collision problems. And Yes I know that you can use UUIDs but those lasts requires 32 hexadecimal digits! (If you care about database size then it can be an issue).
      • If you are using one sequence for all your surrogate keys then - for sure - you will end up with contention on your database.
  2. Error prone. A sequence has a max_value limit so - as a developer - you have to put attention to the following points:
    • You must cycle your sequence ( when the max-value is reached it goes back to 1,2,...).
    • If you are using the sequence as an ordering (over time) of your data then you must handle the case of cycling (column with Id 1 might be newer than row with Id max-value - 1).
    • Make sure that your code (and even your client interfaces which should not happen as it supposed to be an internal Id) supports 32b/64b integers that you used to store your sequence values.
  3. They don't guarantee non duplicated data. You can always have 2 rows with all the same column values but with a different generated value. For me this is THE problem of surrogate keys from a database design point of view.
  4. More in Wikipedia...

Myths on natural keys

  1. Composite keys are less inefficient than surrogate keys. No! It depends on the used database engine:
  2. Natural keys don't exist in real-life. Sorry but they do exist! In aviation industry, for example, the following tuple will be always unique regarding a given scheduled flight (airline, departureDate, flightNumber, operationalSuffix). More generally, when a set of business data is guaranteed to be unique by a given standard then this set of data is a [good] natural key candidate.
  3. Natural keys "pollute the schema" of child tables. For me this is more a feeling than a real problem. Having a 4 columns primary-key of 2 bytes each might be more efficient than a single column of 11 bytes. Besides, the 4 columns can be used to query the child table directly (by using the 4 columns in a where clause) without joining to the parent table.

Conclusion

Use natural keys when it is relevant to do so and use surrogate keys when it is better to use them.

Hope that this helped someone!

Community
  • 1
  • 1
mwnsiri
  • 788
  • 8
  • 14
  • 4
    What happens when the scheduled flight's departureDate is rescheduled? Do you have to track down all the related entities and delete the keys, or do you actually update all the keys in the related entities? Or are you dealing with a simple, singular table (possibly not even 3NF)? – code4life May 19 '16 at 00:48
  • @code4life: That's where the operationalSuffix jumps in. In order to keep the same flightNumber so we avoid client confusion, we add just a suffix (i.e 'D' for example). – mwnsiri Jul 21 '16 at 12:37
  • "You can always have 2 rows with all the same column values but with a different generated value" so just put a unique or composite unique constraint on your columns. – Vanity Slug - codidact.com Sep 13 '19 at 19:46
  • 2
    Why having a primary index on the auto-generated value AND a unique index on the natural key when you can keep only one? – mwnsiri Dec 24 '20 at 21:34
  • 1
    There is one more thing: in DBMS that offer ENUMS, you can use ENUMS for PK/FK values if the amount of lookup values is not bigger then ~65k, so that they will use 1-2 byte. Profits over profits.... – Mihail Gershkovich May 03 '21 at 13:50
16

Alway use a key that has no business meaning. It's just good practice.

EDIT: I was trying to find a link to it online, but I couldn't. However in 'Patterns of Enterprise Archtecture' [Fowler] it has a good explanation of why you shouldn't use anything other than a key with no meaning other than being a key. It boils down to the fact that it should have one job and one job only.

Iain Holder
  • 14,172
  • 10
  • 66
  • 86
10

Surrogate keys are quite handy if you plan to use an ORM tool to handle/generate your data classes. While you can use composite keys with some of the more advanced mappers (read: hibernate), it adds some complexity to your code.

(Of course, database purists will argue that even the notion of a surrogate key is an abomination.)

I'm a fan of using uids for surrogate keys when suitable. The major win with them is that you know the key in advance e.g. you can create an instance of a class with the ID already set and guaranteed to be unique whereas with, say, an integer key you'll need to default to 0 or -1 and update to an appropriate value when you save/update.

UIDs have penalties in terms of lookup and join speed though so it depends on the application in question as to whether they're desirable.

derek lawless
  • 2,544
  • 2
  • 16
  • 13
6

Using a surrogate key is better in my opinion as there is zero chance of it changing. Almost anything I can think of which you might use as a natural key could change (disclaimer: not always true, but commonly).

An example might be a DB of cars - on first glance, you might think that the licence plate could be used as the key. But these could be changed so that'd be a bad idea. You wouldnt really want to find that out after releasing the app, when someone comes to you wanting to know why they can't change their number plate to their shiny new personalised one.

Mark Embling
  • 12,605
  • 8
  • 39
  • 53
  • 2
    Unfortunately cars do have a natural key that doesn't change: the VIN (at least in America...) – jcollum Aug 04 '09 at 17:22
  • @jcollum Yes ok, that is a fair point. My opinion still stands though, my example was not necessarily as good as it could be. – Mark Embling Aug 10 '09 at 08:57
  • 3
    A list of languages would be an example for a natural key, when you base it on ISO codes. So if you then wanted to load content from a table in a certain language, you wouldn't need to join in the `languages` table since the language code (ID) is already in the `texts` table. – DanMan Oct 17 '13 at 17:40
  • 1
    @DanMan I have to agree with you there. There are always going to be some examples which work better with a natural key. Rules or common approaches are never absolute, and that is one example I would 100% go with your approach :-) – Mark Embling Oct 17 '13 at 18:27
5

Case 1: Your table is a lookup table with less than 50 records (50 types)

In this case, use manually named keys, according to the meaning of each record.

For Example:

Table: JOB with 50 records
CODE (primary key)       NAME               DESCRIPTION
PRG                      PROGRAMMER         A programmer is writing code
MNG                      MANAGER            A manager is doing whatever
CLN                      CLEANER            A cleaner cleans
...............
joined with
Table: PEOPLE with 100000 inserts

foreign key JOBCODE in table PEOPLE
looks at
primary key CODE in table JOB

Case 2: Your table is a table with thousands of records

Use surrogate/autoincrement keys.

For Example:

Table: ASSIGNMENT with 1000000 records
joined with
Table: PEOPLE with 100000 records

foreign key PEOPLEID in table ASSIGNMENT
looks at
primary key ID in table PEOPLE (autoincrement)

In the first case:

  • You can select all programmers in table PEOPLE without use of join with table JOB, but just with: SELECT * FROM PEOPLE WHERE JOBCODE = 'PRG'

In the second case:

  • Your database queries are faster because your primary key is an integer
  • You don't need to bother yourself with finding the next unique key because the database itself gives you the next autoincrement.
Stefanos Kargas
  • 10,547
  • 22
  • 76
  • 101
5

On a datawarehouse scenario I believe is better to follow the surrogate key path. Two reasons:

  • You are independent of the source system, and changes there --such as a data type change-- won't affect you.
  • Your DW will need less physical space since you will use only integer data types for your surrogate keys. Also your indexes will work better.
Santiago Cepas
  • 4,044
  • 2
  • 25
  • 31
5

Always use a single column, surrogate key if at all possible. This makes joins as well as inserts/updates/deletes much cleaner because you're only responsible for tracking a single piece of information to maintain the record.

Then, as needed, stack your business keys as unique contraints or indexes. This will keep you data integrity intact.

Business logic/natural keys can change, but the phisical key of a table should NEVER change.

user7658
  • 964
  • 8
  • 7
4

Surrogate keys can be useful when business information can change or be identical. Business names don't have to be unique across the country, after all. Suppose you deal with two businesses named Smith Electronics, one in Kansas and one in Michigan. You can distinguish them by address, but that'll change. Even the state can change; what if Smith Electronics of Kansas City, Kansas moves across the river to Kansas City, Missouri? There's no obvious way of keeping these businesses distinct with natural key information, so a surrogate key is very useful.

Think of the surrogate key like an ISBN number. Usually, you identify a book by title and author. However, I've got two books titled "Pearl Harbor" by H. P. Willmott, and they're definitely different books, not just different editions. In a case like that, I could refer to the looks of the books, or the earlier versus the later, but it's just as well I have the ISBN to fall back on.

David Thornley
  • 56,304
  • 9
  • 91
  • 158
  • 1
    I think I have to disagree with your example here. An ISBN number is an attribute of a book. A surrogate key is independent of the rest of the row data, therefore this position would advocate using a separate surrogate key for a book table, even though the ISBN already uniquely identifies every book. – Christopher Cashell Mar 31 '10 at 21:28
  • 2
    Alternately, think of the ISBN as a surrogate key itself. It's an identifier with no meaning, just a code that is applied to a specific book. If you're making a books table, the ISBN may as well be the primary key (assuming you have and always will have one book per row). – David Thornley Mar 31 '10 at 22:08
  • @Christopher Cashell - Came across this post from a year ago but I thought add something. ISBNs are not guaranteed to be unique and can have duplicates. I have a friend that worked at a library for a number of years and they often ran across books with duplicate ISBNs.The problem is that the uniqueness of the ISBN is incumbent on the publisher rather than one body that ensures that all numbers for all publications are unique and those publishers did not always have their act together. – Thomas Apr 22 '11 at 04:49
  • 2
    Came across this post from a year ago and wanted to mention that ISBN are in fact natural keys. There is meaning baked into the key value itself unlike a surrogate key. For example, part of the key identifies the publisher. In addition, as I mentioned above, they are not guaranteed to be unique. They are *supposed* to be unique but that uniqueness comes from the publishers and they were not always perfect. – Thomas Apr 22 '11 at 04:52
  • Technically, corporations cannot move between states; what happens is that a new corporation is created in the new state and the assets are transferred. That works for database information too. – Warren Dew May 04 '14 at 01:28
2

As a reminder it is not good practice to place clustered indices on random surrogate keys i.e. GUIDs that read XY8D7-DFD8S, as they SQL Server has no ability to physically sort these data. You should instead place unique indices on these data, though it may be also beneficial to simply run SQL profiler for the main table operations and then place those data into the Database Engine Tuning Advisor.

See thread @ http://social.msdn.microsoft.com/Forums/en-us/sqlgetstarted/thread/27bd9c77-ec31-44f1-ab7f-bd2cb13129be

Bryan
  • 3,271
  • 2
  • 15
  • 30
2

This is one of those cases where a surrogate key pretty much always makes sense. There are cases where you either choose what's best for the database or what's best for your object model, but in both cases, using a meaningless key or GUID is a better idea. It makes indexing easier and faster, and it is an identity for your object that doesn't change.

Michael Green
  • 1,397
  • 1
  • 17
  • 25
Charles Graham
  • 24,293
  • 14
  • 43
  • 56
1

Maybe not completely relevant to this topic, but a headache I have dealing with surrogate keys. Oracle pre-delivered analytics creates auto-generated SKs on all of its dimension tables in the warehouse, and it also stores those on the facts. So, anytime they (dimensions) need to be reloaded as new columns are added or need to be populated for all items in the dimension, the SKs assigned during the update makes the SKs out of sync with the original values stored to the fact, forcing a complete reload of all fact tables that join to it. I would prefer that even if the SK was a meaningless number, there would be some way that it could not change for original/old records. As many know, out-of-the box rarely serves an organization's needs, and we have to customize constantly. We now have 3yrs worth of data in our warehouse, and complete reloads from the Oracle Financial systems are very large. So in my case, they are not generated from data entry, but added in a warehouse to help reporting performance. I get it, but ours do change, and it's a nightmare.

lrb
  • 11
  • 1
1

In the case of point in time database it is best to have combination of surrogate and natural keys. e.g. you need to track a member information for a club. Some attributes of a member never change. e.g Date of Birth but name can change. So create a Member table with a member_id surrogate key and have a column for DOB. Create another table called person name and have columns for member_id, member_fname, member_lname, date_updated. In this table the natural key would be member_id + date_updated.

1

Horse for courses. To state my bias; I'm a developer first, so I'm mainly concerned with giving the users a working application.

I've worked on systems with natural keys, and had to spend a lot of time making sure that value changes would ripple through.

I've worked on systems with only surrogate keys, and the only drawback has been a lack of denormalised data for partitioning.

Most traditional PL/SQL developers I have worked with didn't like surrogate keys because of the number of tables per join, but our test and production databases never raised a sweat; the extra joins didn't affect the application performance. With database dialects that don't support clauses like "X inner join Y on X.a = Y.b", or developers who don't use that syntax, the extra joins for surrogate keys do make the queries harder to read, and longer to type and check: see @Tony Andrews post. But if you use an ORM or any other SQL-generation framework you won't notice it. Touch-typing also mitigates.

WillC
  • 957
  • 16
  • 19
  • Also; if you want to really drive home that the surrogate keys are just that, start them at a random large number and increment the sequences by 3+ rather than by 1. Or use the same sequence to generate values for more than one key. – WillC Feb 09 '12 at 23:05
  • CASCADE ON UPDATE and your DBMS takes care of all the changes.... – Mihail Gershkovich May 03 '21 at 13:51