11

I have a SQLServer table that stores employee details, the column ID is of GUID type while the column EmployeeNumber of INT type. Most of the time I will be dealing with EmployeeNumber while doing joins and select criteria's.

My question is, whether is it sensible to assign PrimaryKey to ID column while ClusteredIndex to EmployeeNumber?

AbrahamJP
  • 3,425
  • 7
  • 30
  • 39
  • 1
    @Lamak: I'm quite sure that is not corrent. There can only be one clustered index on a table, but it does not have to be on the primary key. – Daniel Pratt Jan 04 '11 at 19:35
  • 1
    @Lamak: Incorrect. Primary keys and clustered index keys are unrelated. – Remus Rusanu Jan 04 '11 at 19:36
  • Yeah, that's right, my bad. A primary key creates a clustered index automatically if there is no other clustered index already on that table. – Lamak Jan 04 '11 at 19:41
  • @Lamak your statement is not true. The primary key in a SQL Server database does not have to be clustered. When you are creating a column constraints you can specify PRIMARY KEY NONCLUSTERED and then apply a CLUSTERED index to another column. (http://msdn.microsoft.com/en-us/library/aa258255(v=SQL.80).aspx) – Tony Jan 04 '11 at 19:43
  • The data from branch offices will be synced with head office, in that case the only dependable PK type was found to be GUID. What I could understand from all you answers is that, never use clustered index on GUIDs, which definitely makes the EmployeeNumber the most suitable column for clustered index, while PK for ID. – AbrahamJP Jan 04 '11 at 19:45
  • Just to further shed some light on this, in SQL Server I believe the primary key is used *by default* as the clustered index, but as others have mentioned, the clustered index can be specified on columns other than the primary key. – Dr. Wily's Apprentice Jan 04 '11 at 19:47

6 Answers6

12

Yes, it is possible to have a non-clustered primary key, and it is possible to have a clustered key that is completely unrelated to the primary key. By default a primary keys gets to be the clustered index key too, but this is not a requirement.

The primary key is a logical concept: is the key used in your data model to reference entities.
The clustered index key is a physical concept: is the order in which you want the rows to be stored on disk.

Choosing a different clustered key is driven by a variety of factors, like key width when you desire a narrower clustered key than the primary key (because the clustered key gets replicated in every non-clustered index. Or support for frequent range scans (common in time series) when the data is frequently accessed with queries like date between '20100101' and '20100201' (a clustered index key on date would be appropriate).

This subject has been discussed here ad nauseam before, see also What column should the clustered index be put on?.

Community
  • 1
  • 1
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
11

The ideal clustered index key is:

  1. Sequential
  2. Selective (no dupes, unique for each record)
  3. Narrow
  4. Used in Queries

In general it is a very bad idea to use a GUID as a clustered index key, since it leads to mucho fragmentation as rows are added.

EDIT FOR CLARITY:

PK and Clustered key are indeed separate concepts. Your PK does not need to be your clustered index key.

In practical applications in my own experience, the same field that is your PK should/would be your clustered key since it meets the same criteria listed above.

JNK
  • 63,321
  • 15
  • 122
  • 138
  • 3
    I'm quite sure that the statement "...your PK will be your clustered key in SQL server..." is not exactly correct. A clustered index can be based on a unique key, for example. Otherwise, I like your answer. – Daniel Pratt Jan 04 '11 at 19:37
  • 2
    You can, however use a sequential GUID (NEWSEQUENTIALID) if you need the benefits of global uniqueness. – Cade Roux Jan 04 '11 at 19:43
  • The PRIMARY KEY does not have to be CLUSTERED. See the comments on the question. – Tony Jan 04 '11 at 19:49
2

First, I have to say that I have misgivings about the choice of a GUID as the primary key for this table. I am of the opinion that EmployeeNumber would probably be a better choice, and something naturally unique about the employee would be better than that, such as an SSN (or ATIN), which employers must legally obtain anyway (at least in the US).

Putting that aside, you should never base a clustered index on a GUID column. The clustered index specifies the physical order of rows in the table. Since GUID values are (in theory) completely random, every new row will fall at a random location. This is very bad for performance. There is something called 'sequential' GUIDs, but I would consider this a bit of a hack.

Daniel Pratt
  • 12,007
  • 2
  • 44
  • 61
1

Using a clustured index on something else than the primary key will improve performance on SELECT query which will take advantage of this index.

But you will loose performance on UPDATE query, because in most scenario, they rely on the primary key to found the specific row you want to update.

CREATE query could also loose performance because when you add a new row in the middle of the index a lot of row have to be moved (physically). This won't happen on a primary key with an increment as new record will always be added in the end and won't make move any other row.

If you don't know what kind of operation need the most performance, I recommend to leave the clustered Index on the primary key and use nonclustered index on common search criteria.

Marco Guignard
  • 613
  • 3
  • 9
0

Clustered indexes cause the data to be physically stored in that order. For this reason when testing for ranges of consecutive rows, clustered indexes help a lot.

GUID's are really bad clustered indexes since their order is not in a sensible pattern to order on. Int Identity columns aren't much better unless order of entry helps (e.g. most recent hires)

Since you're probably not looking for ranges of employees it probably doesn't matter much which is the Clustered index, unless you can segment blocks of employees that you often aren't interested in (e.g. Termination Dates)

Conrad Frix
  • 51,984
  • 12
  • 96
  • 155
  • GUIDS can be used successfully as clustered indexes as long as you use the NEWSEQUENTIALID() function to generate them; this has it's own problems as you can then only use a single machine to guarantee they are sequential. But I agree with your other points that it's better to find a natural key if possible. – Tony Jan 04 '11 at 19:47
0

Since EmployeeNumber is unique, I would make it the PK. In SQL Server, a PK is often a clustered index.

Joins on GUIDs is just horrible. @JNK answers this well.

IamIC
  • 17,747
  • 20
  • 91
  • 154
  • 1
    Hmm. As evidenced by several comments and post to this question, it seems that there is a common misconception that primary keys are always clustered or the only choice for clustered indexes. As I (and Remus) pointed out elsewhere, this is not the case. – Daniel Pratt Jan 04 '11 at 19:39