87

In a SQL Server db, what is the difference between a Primary Key and an Identity column? A column can be a primary key without being an indentity. A column cannot, however, be an identity without being a primary key.

In addition to the differences, what does a PK and Identity column offer that just a PK column doesn't?

user4157124
  • 2,809
  • 13
  • 27
  • 42
DenaliHardtail
  • 27,362
  • 56
  • 154
  • 233

7 Answers7

107

A column can definitely be an identity without being a PK.

An identity is simply an auto-increasing column.

A primary key is the unique column or columns that define the row.

These two are often used together, but there's no requirement that this be so.

Joe
  • 41,484
  • 20
  • 104
  • 125
  • 1
    Actually, I believe SQL Server does require them to be so, but other databases may not. – Erik Funkenbusch Nov 27 '10 at 19:25
  • 12
    I'm 99% sure it doesn't, but don't have access to an instance to confirm. – Joe Nov 27 '10 at 19:30
  • 1
    Then pasted in the wrong create table. This one works with the non-identity column being the PK... create table pktest (nonpk int identity(1,1), pk int primary key not null) – KenJ Nov 27 '10 at 19:32
  • 3
    +1 good answer. PK doesn't need to be an auto-incrementing number but it **does** need to be a unique value for each entry. To use a PK that isn't an identity it may be necessary to check any new identities being entered vs those already existing in the table first. – Evan Plaice Nov 27 '10 at 20:13
  • 2
    SQL Server will reject a duplicate entry for a primay key, quicker than any check we could write for duplicates. We handle the rejection in code so in effect we have checked for duplicate entries. – RC_Cleland Nov 28 '10 at 14:32
16

This answer is more of WHY identity and primary key than WHAT they are since Joe has answered WHAT correctly above.

An identity is a value your SQL controls. Identity is a row function. It is sequential, either increasing or decreasing in value, at least in SQL Server. It should never be modified and gaps in the value should be ignored. Identity values are very useful in linking table B to table A since the value is never duplicated. The identity is not the best choice for a clustered index in every case. If a table contains audit data the clustered index may be better being created on the date occurred as it will present the answer to the question " what happened between today and four days ago" with less work because the records for the dates are sequential in the data pages.

A primary key makes the column or columns in a row unique. Primary key is a column function. Only one primary key may be defined on any table, but multiple unique indexes may be created which simulates the primary key. Clustering the primary key is not always the correct choice. Consider a phone book. If the phone book is clustered by the primary key(phone number) the query to return the phone numbers on "First Street" will be very costly.

The general rules I follow for identity and primary key are:

  1. Always use an identity column
  2. Create the clustered index on the column or columns which are used in range lookups
  3. Keep the clustered index narrow since the clustered index is added to the end of every other index
  4. Create primary key and unique indexes to reject duplicate values
  5. Narrow keys are better
  6. Create an index for every column or columns used in joins

These are my GENERAL rules.

SoMZeY
  • 7
  • 5
RC_Cleland
  • 2,274
  • 14
  • 16
  • 2
    You mention that identity values are never duplicated. They can be duplicated if manually inserted or reseeded (https://subhrosaha.wordpress.com/2012/11/14/sql-server-duplicate-values-in-identity-columns/). If you want to guarante uniqueness constraint e.g. unique index must be added to column/s. – broadband Aug 03 '15 at 11:29
10

A primary key (also known as a candidate key) is any set of attributes that have the properties of uniqueness and minimality. That means the key column or columns are constrained to be unique. In other words the DBMS won't permit any two rows to have the same set of values for those attributes.

The IDENTITY property effectively creates an auto-incrementing default value for a column. That column does not have to be unique though, so an IDENTITY column isn't necessarily a key.

However, an IDENTITY column is typically intended to be used as a key and therefore it usually has a uniqueness constraint on it to ensure that duplicates are not permitted.

nvogel
  • 24,981
  • 1
  • 44
  • 82
  • 7
    So it seems that the word "Identity" in this case is a bit of a misnomer since it is not always unique, as you pointed out, [as have others](https://www.simple-talk.com/sql/t-sql-programming/identity-columns/). Your answer helps those of who are more experienced with MySql to better understand Sql Server's Identity property by emphasizing that it simply creates an auto-incrementing default value for a column, and therefore seems analogous to [MySql's AUTO_INCREMENT property](http://www.w3schools.com/sql/sql_autoincrement.asp). Thanks. – Jason Frank Jul 20 '13 at 13:46
2

Major Difference between Primary and Identity Column

Primary Column:

  • Primary Key cannot have duplicate values.
  • It creates a clustered index for the Table.
  • It can be set for any column type.
  • We need to provide the primary column value while inserting in the table.

Identity Column:

  • Identity Column can have duplicate value.
  • It can only be set for Integer related columns like int, bigint, smallint, tinyint or decimal
  • No need to insert values in the identity column. It is inserted automatically based on the seed.
Vikram
  • 1,617
  • 5
  • 17
  • 37
1

EDITS MADE BASED ON FEEDBACK

A key is unique to a row. It's a way of identifying a row. Rows may have none, one, or several keys. These keys may consist of one or more columns.

Keys are indexes with a unique constraint. This differentiates them from non-key indexes.

Any index with multi-columns is called a "composite index".

Traditionally, a primary key is viewed as the main key that uniquely identifies a row. There may only be one of these.

Depending on the table's design, one may have no primary key.

A primary key is just that - a "prime key". It's the main one that specifies the unique identity of a row. Depending on a table's design, this can be a misnomer and multiple keys express the uniqueness.

In SQL Server, a primary key may be clustered. This means the remaining columns are attached to this key at the leaf level of the index. In other words, once SQL Server has found the key, it has also found the row (to be clear, this is because of the clustered aspect).

An identity column is simply a method of generating a unique ID for a row.

These two are often used together, but this is not a requirement.

IamIC
  • 17,747
  • 20
  • 91
  • 154
  • 1
    SQL Server does not require the primary key to be a clustered index. – RC_Cleland Nov 28 '10 at 14:25
  • 1
    @IanC : "The reason it's called Primary is this is the key that uniquely identifies the row". This is not really an explanation because ANY key or any superkey uniquely identifies a row, not just the "primary" key. "Primary" really just means a "preferred" or "special" key - although it's only as special as you want it to be. – nvogel Nov 29 '10 at 20:49
  • @dportas Well, that is true. However, as MSDN says, "A table typically has a column or combination of columns that contain values that uniquely identify each row in the table. This column, or columns, is called the primary key (PK) of the table and enforces the entity integrity of the table." http://msdn.microsoft.com/en-us/library/ms191236.aspx – IamIC Nov 29 '10 at 20:54
  • 1
    @IanC : The danger is that by describing it that way an uninformed person might get the impression that there can or should be only one such key in a table, which is obviously untrue. – nvogel Nov 29 '10 at 20:59
  • @dpartas Well, you can only have 1 PK. Out of interest, why would you want to use a non-PK to do a PK's job (uniquely ID a row)? – IamIC Nov 29 '10 at 21:08
  • 1
    @IanC : Because it's quite common for things to have more than one identifier and different keys are sometimes needed by different users of the database (developers often work with different keys to the end users for example). In principle all candidate keys can serve the same function. So it isn't particularly meaningful to say that we can or must only have one primary key or that the primary key is special in any fundamental way. – nvogel Nov 29 '10 at 22:32
  • @dportas One can't have more than one PK. You keep saying one can. This isn't true. There is something you're missing here. What you're talking about is simply keys that are "primary" to different people. You can't just change nomenclature. What I've said is not only in MSDN, but holds true for every RDBMS I've ever worked with. – IamIC Nov 30 '10 at 05:46
  • 1
    I said a table can have multiple candidate keys. Key means Candidate Key and "primary" key (if it means anything special at all) just means "one of the candidate keys". SQL Server, in common with all DBMSs that I know of, allows multiple keys per table. SQL provides two uniqueness constraints for implementing keys: PRIMARY KEY and UNIQUE (on non-nullable columns of course). Both those kinds of key constraint achieve exactly the same purpose. – nvogel Nov 30 '10 at 06:20
  • 1
    The point I originally corrected you on was that since all keys are equivalent in principle and in function it isn't good enough to say that a "primary key" is THE key that uniquely identifies a row. All keys do that, not just "primary" ones. – nvogel Nov 30 '10 at 06:20
  • You say you "corrected me", yet I quoted MSDN. Perhaps you need to correct Microsoft too. You're arguing against convention and are missing the point entirely. Please go and re-read the MSDN article I referenced and work out for yourself *why* we use the term PK. All keys are NOT equivalent in function unless they are unique, and even that is a half-truth. If I have a table "RID, CustomerID (unique)" then sure, customer is unique, but we wouldn't say it ID's the *row*. The PK has that function. Peace. – IamIC Nov 30 '10 at 06:46
  • In principle, I agree with you that their functions *can be* equivalent. This, however, doesn't negate the fact of what a PK is. – IamIC Nov 30 '10 at 06:54
  • 1
    All keys are unique. That is after all implicit in the definition of a key: i.e. a set of attributes that are irreducibly unique. I'm not in any way disputing what a primary key is. I just wanted to make sure that no-one inferred from what you said that a "primary" key meant there was just ONE way to uniquely identify a row. For sure, Microsoft is not beyond correction either. But no-one who is serious about learning database principles or database design is likely to use Microsoft as their textbook - at least I seriously hope not! – nvogel Nov 30 '10 at 09:06
  • I agree with you that all keys are unique and that a non PK can be used to make a unique row. Anything else would be a contradiction of the def. of "key". Of course. My only point is you're talking about perspective (the developer, users, etc.). In my opinion, from a developer's viewpoint, it is handy to think in terms of PKs as this normalizes the uniqueness of a row in one's mind. However, we're into semantics now. Re. Microsoft, I think SQL Server is a very respectable product. I wouldn't trash them for it. :) – IamIC Nov 30 '10 at 09:31
  • 1
    SQL Server is a fine product. Microsoft's documentation of it is much less than fine :) The topic here is an important one because there is such a lot of nonsense written, especially online, about databases in general and keys in particular. – nvogel Nov 30 '10 at 09:58
  • 1
    The "Keys are indexes" comment is a classic error but if it is understood to mean "In SQL Server keys are always implemented with an index" then it's just about forgivable I suppose. Not sure why you now want to say that rows, rather than tables have keys. In any case I don't think it's worth refining it further. Anyone who wants this level of detail would be well advised to look it up in a good book. – nvogel Dec 01 '10 at 11:33
  • Yes, I am meaning in SQL Server parlance, since the OP asked in this context :). Out of interest, what DB has keys that are not part of an index? – IamIC Dec 01 '10 at 12:33
  • I am saying that keys are assigned to rows. This is true. – IamIC Dec 01 '10 at 12:34
1

You can use IDENTITY not only with integers, but also with any numeric data type that has a scale of 0

primary key could have scale but its not required.

IDENTITY, combined with a PRIMARY KEY or UNIQUE constraint, lets you provide a simple unique row identifier

Nikki
  • 409
  • 1
  • 5
  • 15
0

Primary key emphasizing on uniqueness and avoid duplication value for all records on the same column, while identity provides increasing numbers in a column without inserting data. Both features could be on a single column or on difference one.

Shaahin
  • 1,195
  • 3
  • 14
  • 22