21

Our data resides in a SQL Server 2008 database, there will be a lot queries and joinings between tables. We have this argument inside the team, some are arguing use of integer identity is better for performance, some are arguing use of guid (unique identifier).

Does the performance really suffer that badly using a GUID as a primary key?

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
TOMMY WANG
  • 1,382
  • 3
  • 16
  • 39
  • 2
    The biggest performance and fragmentation issues with an `UNIQUEIDENTIFIER` will come if you do your PK a clustered index – Lamak Mar 15 '12 at 19:53
  • so it does matters, is it true to say always use int rather than guid as pk then? Why does everyone use guid then? – TOMMY WANG Mar 15 '12 at 19:55
  • take a look at this link to see the effects on fragmentation that using `UNIQUEIDENTIFIER` http://www.sqlskills.com/blogs/paul/post/Can-GUID-cluster-keys-cause-non-clustered-index-fragmentation.aspx . On the other hand, rarely someone uses `UNIQUEIDENTIFIER` on a clustered index – Lamak Mar 15 '12 at 20:00
  • So is it ok to use the uniqueidenfier as primary key, but just not as clustered index. (in other words, make it primary key with no clustered index) – TOMMY WANG Mar 15 '12 at 20:05
  • 2
    I use them on clustered indexes regularly. The fragmentation issue is due to the way new values are computed, not due to the `uniqueidentifier` data type itself. If you use random numbers for an integer ID you'd have the same problem. Use `NEWSEQUENTIALID()` or a COMB-like method and it shouldn't be a real issue. – richardtallent Mar 15 '12 at 20:17
  • richardtallent, do you use the uniqueidenfier to join with other tables? – TOMMY WANG Mar 15 '12 at 20:20
  • **think cache memory!!** 4 byte INT vs. 16 byte UNIQUEIDENTIFIER, you'll need to drag those 16 bytes of Primary key onto every index. I'd rather use those 12 extra bytes on include columns and even get better performance over the UNIQUEIDENTIFIER. – KM. Mar 15 '12 at 21:16
  • The main reason for using GUID's IMHO is to prevent data sniffing. UserId's that are Ints and then used in query strings can easily be modified and used in database sniffing attemps. However UserId's that are GUIDS when used in querystrings prevent this type of database sniffing. I use INT's as my index and I use GUIDs in any type of external comunication between server and customer. Hope this helps. – Michael Riley - AKA Gunny Mar 17 '12 at 13:33
  • 1
    Clustering on a random GUID can actually help performance, contrary to the popular belief that using a sequential GUID is better. The randomness of a GUID can actually reduce contention on the last data page, and increase insert performance on high I/O systems significantly. See: http://blog.kejser.org/2011/10/05/boosting-insert-speed-by-generating-scalable-keys/ – Triynko Nov 14 '13 at 21:44

5 Answers5

38

A 128-bit GUID (uniqueidentifier) key is of course 4x larger than a 32-bit int key. However, there are a few key advantages:

  • No "IDENTITY INSERT" issue when merging content
  • If you use a COMB value instead of NEWSEQUENTIALID(), you get a "free" INSERT timestamp. You can even SELECT from the primary key based on a date/time range if you want with a few fancy CAST() calls.
  • They are globally unique, which turns out to be pretty handy now and then.
  • Since there's no need to track high-water marks, your BL layer can assign the value rather than SQL Server, thus eliminating the step of SELECT scope_identity() to get the primary key after an insert.
  • If it's even remotely possible that you could have more than 2 billion records, you'll need to use bigint (64 bits) instead of int. Once you do that, uniqueidentifier is only twice as big as a bigint.
  • Using GUIDs makes it safer to expose keys in URLs, etc. without exposing yourself to "guess-the-ID" attacks.
  • Between how SQL Server loads pages from disk and how processors are now mostly 64-bit, just because a number is 128 bits instead of 32 doesn't mean it takes 4x longer to compare. The last test I saw showed that GUIDs are nearly as fast.
  • Index size depends on how many columns are included. Even though the GUIDs themselves are larger, the extra 8 or 12 bytes may be insignificant compared to the other columns in the index.

In the end, squeezing out some small performance advantage by using integers may not be worth losing the advantages of a GUID. Test it empirically and decide for yourself.

Personally, I still use both, depending on the situation, but the deciding factor has never really come down to performance in my case.

Jesuraja
  • 3,774
  • 4
  • 24
  • 48
richardtallent
  • 34,724
  • 14
  • 83
  • 123
  • 3
    +1 for mentioning Comb as I've read that this also drastically reduces the index fragmentation too. – Martin May 15 '12 at 15:04
  • 2
    Combs (i.e. sequential GUIDs) may reduce fragmentation, but on high I/O systems it seems that RANDOM non-sequential GUIDs can actually increase performance, particularly for inserts. The reason is that the page splits are cheaper than the contention caused by trying to insert everything on the last data page, as with sequential IDs. See: http://blog.kejser.org/2011/10/05/boosting-insert-speed-by-generating-scalable-keys/ It really depends on the underlying system. – Triynko Nov 14 '13 at 21:48
  • 2
    Guid as PK ill perform horrible at inserts if they are clustered and a PK is by default a clustered index, meaning the engine ill keep the table (physical) ordered and causing table splits and reordering. There's no benefecial way to expose ID in urls, no difference if they are strings, ints, guids or whatever. Guids don't obfuscate it. – jean Dec 18 '13 at 10:10
  • @jean the insert performance will not be 'horrible' if you use a sequential guid. it would be exactly the same as a big int, only 8 bytes larger which in 99.999999% cases is irrelevant – AaronHS May 26 '14 at 06:45
  • @AaronH if you use **sequential** guid the performance hit ill not be horrible as ordinary no sequential one. But yes there's a minor concern about the "size" of your PK since it ill affect the number of rows per page leading the engine to work a bit more on page management – jean May 26 '14 at 13:30
25

I personally use INT IDENTITY for most of my primary and clustering keys.

You need to keep apart the primary key which is a logical construct - it uniquely identifies your rows, it has to be unique and stable and NOT NULL. A GUID works well for a primary key, too - since it's guaranteed to be unique. A GUID as your primary key is a good choice if you use SQL Server replication, since in that case, you need an uniquely identifying GUID column anyway.

The clustering key in SQL Server is a physical construct is used for the physical ordering of the data, and is a lot more difficult to get right. Typically, the Queen of Indexing on SQL Server, Kimberly Tripp, also requires a good clustering key to be uniqe, stable, as narrow as possible, and ideally ever-increasing (all of which a INT IDENTITY is).

See her articles on indexing here:

and also see Jimmy Nilsson's The Cost of GUIDs as Primary Key

A GUID is a horribly bad choice for a clustering key, since it's wide, totally random, and thus leads to bad index fragmentation and poor performance. Also, the clustering key row(s) is also stored in each and every entry of each and every non-clustered (additional) index, so you really want to keep it small - GUID is 16 byte vs. INT is 4 byte, and with several non-clustered indices and several million rows, this makes a HUGE difference.

In SQL Server, your primary key is by default your clustering key - but it doesn't have to be. You can easily use a GUID as your NON-Clustered primary key, and an INT IDENTITY as your clustering key - it just takes a bit of being aware of it.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    "A GUID is a horribly bad choice for a clustering key" vs "The last test I saw showed that GUIDs are nearly as fast".... – TOMMY WANG Mar 15 '12 at 21:47
  • 2
    @TOMMYWANG: a regular GUID is **NOWHERE NEAR** as fast as INT - see [Disk space is cheap .... that's NOT the point!](http://www.sqlskills.com/BLOGS/KIMBERLY/post/Disk-space-is-cheap.aspx) by Kim Tripp, with some tests on INT vs. GUID – marc_s Mar 16 '12 at 05:53
  • Generalisation: "A GUID is a horribly bad choice for a clustering key, since it's wide, totally random, and thus leads to bad index fragmentation and poor performance". This is a sweeping statement that is OFTEN true. But Do you assume the cases when it not true a dba will know to ignore this advise? Unfortunately the environment in which the advise is given isnt clear. I understand you cant cover all scenarios, but lets got a little easy on the hyperbolas. I have seen a scenario, albeit on a another DB, that used Clustered Partitioned GUIDs as BEST practice. – phil soady Mar 01 '13 at 05:19
4

The big problem with GUIDs as primary keys is that they cause massive table fragmentation, which can be a big performance issue (the larger the table, the larger the issue). Even as a key for a nonclustered index, they will cause index fragmentation.

You can partly mitigate the problem by setting an appropriate fill factor -- but it will still be an issue.

The size difference doesn't bother me that much, except on tables with otherwise narrow rows where table scans are also required. In those cases, being able to fit more rows per DB page is a performance advantage.

There can be good reasons to use GUIDs, but there is also a cost. I generally prefer INT IDENTITY for primary keys, but I don't avoid GUIDs when they are a better solution.

RickNZ
  • 18,448
  • 3
  • 51
  • 66
0

The major advantage of using GUIDs is that they are unique across all space and time.

The main disadvantage to using GUIDs as key values is that they are BIG. At 16 bytes a pop, they are one of the largest datatypes in SQL Server. Indexes built on GUIDs are going to be larger and slower than indexes built on IDENTITY columns, which are usually ints (4 bytes).

So they are a good solution for the cases where you need to merge data from several sources

Source : http://www.sqlteam.com/article/uniqueidentifier-vs-identity

Julien
  • 3,509
  • 20
  • 35
-2

If database table records can grow into million records, I think it is not a good idea to use it as a primary key.

  • 1
    I don't understand the reasoning behind your answer; GUIDs are used quite frequently in many languages to represent unique values. ASP.NET uses it heavily in its security implementation. – Paul Apr 09 '15 at 09:40