5

Up until now i've been using the C# "Guid = Guid.NewGuid();" method to generate a unique ID that can be stored as the ID field in some of my SQL Server database tables using Linq to SQL. I've been informed that for indexing reasons, using a GUID is a bad idea and that I should use an auto-incrementing Long instead. Will using a long speed up my database transactions? If so, how do I go about generating unique ID's that are of type Long?

Regards,

Goober
  • 13,146
  • 50
  • 126
  • 195
  • "I've been informed that for indexing reasons, using a GUID is a bad idea" - If you have a clustered index on your identity field, a GUID will hurt the performance of inserts. SQL Server stores the table in an order based on the clustered index. As new GUIDs are non-sequential, inserting a new row will have to insert into the middle of the table instead of at the end, causing a lot of IO. A long as the clustered index column type will avoid this problem. – adrianbanks Jul 23 '09 at 11:50
  • 1
    just to add - if the GUID the primary key, then by default the GUID will be clustered - this can be non-clustered, but needs manual intervention – Kev Riley Jul 23 '09 at 11:53
  • See http://sqlblogcasts.com/blogs/martinbell/archive/2009/05/25/GUID-Fragmentation-in-SQL-Server.aspx - and http://www.dotnetrocks.com/default.aspx?showNum=455 for a podcast with lots of content on the topic. – Robert Venables Jul 23 '09 at 11:55

7 Answers7

11

Both have pros and cons, it depends entirely on how you use them that matters.

Right off the bat, if you need identifiers that can work across several databases, you need GUIDs. There are some tricks with Long (manually assigning each database a different seed/increment), but these don't scale well.

As far as indexing goes, Long will give much better insert performance if the index is clustered (by default primary keys are clustered, but this can be modified for your table), since the table does not need to be reorganized after every insert.

As far as concurrent inserts are concerned however, Long (identity) columns will be slower then GUID - identity column generation requires a series of exclusive locks to ensure that only one row gets the next sequential number. In an environment with many users inserting many rows all the time, this can be a performance hit. GUID generation in this situation is faster.

Storage wise, a GUID takes up twice the space of a Long (8 bytes vs 16). However it depends on the overall size of your row if 8 bytes is going to make a noticable difference in how many records fit in one leaf, and thus the number of leaves pulled from disk during an average request.

David
  • 24,700
  • 8
  • 63
  • 83
  • 3
    plus if you use GUID as your clustering key, that value (twice as big as a BIGINT) is added to each and every entry in each and every non-clustered index, too - thus multiplying the waste of space, not just on disk, but also in SQL Server RAM. – marc_s Jul 23 '09 at 13:53
4

The "Queen of Indexing" - Kim Tripp - basically says it all in her indexing blog posts:

Basically, her best practices are: an optimal clustering key should be:

  • unique
  • small
  • stable (never changing)
  • ever-increasing

GUID's violate the "small" and "ever-increasing" and are thus not optimal.

PLUS: all your clustering keys will be added to each and every single entry in each and every single non-clustered index (as the lookup to actually find the record in the database), thus you want to make them as small as possible (INT = 4 byte vs. GUID = 16 byte). If you have hundreds of millions of rows and several non-clustered indices, choosing an INT or BIGINT over a GUID can make a major difference - even just space-wise.

Marc

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 2
    If you are worried about scale do NOT choose a GUID. Probably doesn't make a big difference from a speed or scale perspective in smaller databases but if you get into 100 millon+ row tables, with more than a few indexes, GUIDs are a performance nightmare... Read Kim Tripp's articles above, it's a huge decision, and one that cost us 3 months of technical debt – Tom DeMille May 23 '11 at 20:43
  • HI @marc_s. Its old question. But what you decide to choose in your past projects? Are you use GUID? or random bigint generated key? – deveton Jun 10 '21 at 19:48
  • 1
    @deveton: i try to *avoid* GUID whenever possible - they tend to have negative impacts on so many things in terms of performance. I did and still do very much prefer using `INT IDENTITY` or `BIGINT IDENTITY` for my surrogate primary keys. – marc_s Jun 10 '21 at 19:51
  • About random bigint? How to make sure it will never duplicated... GUID will never duplicated in this earth. but maybe random int/bigint will duplicated? – deveton Jun 10 '21 at 20:03
  • 1
    @deveton: no, not random - `IDENTITY` provides ever-increasing values, which are guaranteed to be unique by design - as long as you just let them be and don't fiddle around with the identity column.... – marc_s Jun 10 '21 at 20:05
3

A long (big int in sql server) is 8 bytes and a Guid is 16 bytes, so you are halving the number of the bytes sql server has to compare when doing a look up.

For generating a long, use IDENTITY(1,1) when you create the field in the database.

so either using create table or alter table:

Field_NAME BIGINT NOT NULL PRIMARY KEY IDENTITY(1,1)

See comments for posting Linq to sql

kemiller2002
  • 113,795
  • 27
  • 197
  • 251
  • 1
    When you submit your new record to the database using L2S leave the PK unpopulated. It will be populated by L2S as part of the submit so you'll be able to retrieve it from your record object after submission, should you so need to do so. – Lazarus Jul 23 '09 at 11:48
  • Is there a way of doing this using the SQL Server design view? My SQL scripting code isnt too great........ – Goober Jul 23 '09 at 12:04
  • There is, but I don't know how to do it. Sorry :( – kemiller2002 Jul 23 '09 at 13:58
3

have a look at this

Is it better to use an uniqueidentifier(GUID) or a bigint for an identity column?

Community
  • 1
  • 1
Adriaan Stander
  • 162,879
  • 31
  • 289
  • 284
1

You can debate GUID or identity all day. I prefer the database to generate the unique value with an identity. If you merge data from multiple databases, add another column (to identify the source database, possibly a tinyint or smallint) and form a composite primary key.

If you do go with an identity, be sure to pick the right datatype, based on number of expected keys you will generate:

bigint - 8 Bytes - max positive value: 9,223,372,036,854,775,807  
int    - 4 Bytes - max positive value:             2,147,483,647

Note "number of expected keys " is different than the number of rows. If you mainly add and keep rows, you may find that an INT is enough with over 2 billion unique keys. I'll bet your table won't get that big. However, if you have a high volume table where you keep adding and removing rows, you row count may be low, but you'll go through keys fast. You should do some calculations to see how log it would take to go through the INTs 2 billion keys. If it won't use them up any time soon go with INT, otherwise double the key size and go with BIGINT.

Community
  • 1
  • 1
KM.
  • 101,727
  • 34
  • 178
  • 212
1

Use guids when you need to consider import/export to multiple databases. Guids are often easier to use than columns specifying the IDENTITY attribute when working with a dataset of multiple child relationships. this is because you can randomly generate guids in the code in a disconnected state from the database, and then submit all changes at once. When guids are generated properly, they are insainely hard to duplicate by chance. With identity columns, you often have to do an intial insert of a parent row and query for it's new identity before adding child data. You then have to update all child records with the new parent identity before committing them to the database. The same goes for grandchildren and so on down the heirarchy. It builds up to a lot of work that seems unnecessary and mundane. You can do something similar to Guids by comming up with random integers without the IDENTITY specification, but the chance of collision is greatly increased as you insert more records over time. (Guid.NewGuid() is similar to a random Int128 - which doesn't exist yet).

I use Byte (TinyInt), Int16 (SmallInt), Int32/UInt16 (Int), Int64/UInt32 (BigInt) for small lookup lists that do not change or data that does not replicate between multiple databases. (Permissions, Application Configuration, Color Names, etc.)

I imagine the indexing takes just as long to query against regardless if you are using a guid or a long. There are usually other fields in tables that are indexed that are larger than 128 bits anyway (user names in a user table for example). The difference between Guids and Integers is the size of the index in memory, as well as time populating and rebuilding indexes. The majority of database transactions is often reading. Writing is minimal. Concentrate on optimizing reading from the database first, as they are usually made of joined tables that were not optimized properly, improper paging, or missing indexes.

As with anything, the best thing to do is to prove your point. create a test database with two tables. One with a primary key of integers/longs, and the other with a guid. Populate each with N-Million rows. Moniter the performance of each during the CRUD operations (create, read, update, delete). You may find out that it does have a performance hit, but insignificant.

Servers often run on boxes without debugging environments and other applications taking up CPU, Memory, and I/O of hard drive (especially with RAID). A development environment only gives you an idea of performance.

Lewie
  • 537
  • 3
  • 6
1

Consider creating sequential GUID from .NET application:

http://dotnet-snippets.de/dns/sequential-guid-SID998.aspx

What are the performance improvement of Sequential Guid over standard Guid?

Community
  • 1
  • 1
Boris Modylevsky
  • 3,029
  • 1
  • 26
  • 42