8

How much I can rely on GUID in .Net ? My SA told me that

we will use GUID as primary keys in all tables.

I wonder the reliability of GUID as a primary key.

Can there be any chances that there will be duplicate ?

Should we really use this way ?

How about the performance ?

Any advise would be helpful for me.

Community
  • 1
  • 1
kevin
  • 13,559
  • 30
  • 79
  • 104
  • http://en.wikipedia.org/wiki/Globally_unique_identifier – Jon Sep 13 '11 at 06:54
  • Here's a guid: `777....777`; and here's another: `777...777` - so yes there can be duplicates, but it depends how you ***generate*** them; assuming you are using `NEWID()` or `Guid.NewGuid()` you should be fine. – Marc Gravell Sep 13 '11 at 07:04
  • I would like to see an answer that talks about replication (at least as an aside) -- does using a GUID as a PK make this easier/harder/indifferent? –  Sep 13 '11 at 07:07
  • If I remember correctly there are some problems with indexing under SQL-Server (and probably other DB) (it slow down the whole thing. No crash problems!). If you google around you should find it. Found. Linked in my reply. – xanatos Sep 13 '11 at 07:26
  • Even **if** you should ever happen to get a duplicate - since it's the **primary key** of your table, that table won't allow insertion of a duplicate and will just throw that data out .... – marc_s Sep 13 '11 at 07:27
  • You didn't mention what database system you're using - not all databases have a GUID type, so that might be an issue. Also: in Microsoft SQL Server, using a GUID as the primary key (which automatically becomes the **clustering key** by default) is highly controversial (to put it mildly) due to performance and fragmentation issues. See [GUID as Primary and/or Clustering Key](http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx) – marc_s Sep 13 '11 at 07:42
  • Primary key is not the same thing as a clustering key. Different considerations apply to those two different things and they should not be conflated or confused with each other. – nvogel Sep 13 '11 at 09:20
  • dont forget to mark asnwer as accepted if you got the info you want – Pranay Rana Sep 14 '11 at 08:31
  • @Rana >> All the answers are great for me !!! So pondering which one is the best !!! You guys are awesome !!! ;-) – kevin Sep 14 '11 at 08:57

6 Answers6

6

This are some points for GUID which give you answer

Advantage:

  1. Unique across the server.

Disadvantage:

  1. String values are not as optimal as integer values for performance when used in joins, indexes and conditions.
  2. More storage space is required than INT.

You can read full post aobut this at : SQL SERVER – GUID vs INT – Your Opinion

Pranay Rana
  • 175,020
  • 35
  • 237
  • 263
  • "Plain GUIDs" are little better for indexes due to fragmentation. (Hence why SQL Server supports *sequential* GUID generation... which also decreases the GUID domain of the given server instance, I believe. Not that it's apt to be exhausted by any means ;-) –  Sep 13 '11 at 06:55
  • @pst - thanks for the info i dont have idea bout it let me check – Pranay Rana Sep 13 '11 at 06:56
  • 1
    [NEWSEQUENTIALID](http://msdn.microsoft.com/en-us/library/ms189786.aspx) vs. [NEWID](http://msdn.microsoft.com/en-us/library/ms190348.aspx) ... Hmm. It's only increasing since *windows was restarted*. Didn't know that :-/ –  Sep 13 '11 at 06:58
  • 1
    Guids are not strings. Indexing on GUIDs is an entirely different topic. – H H Sep 13 '11 at 06:59
  • @pst - hmm..one more thing is "It is possible to guess the value of the next generated GUID" – Pranay Rana Sep 13 '11 at 07:03
  • @Pranay - You have a 1 to 16^32 shot, so no. Can't be guessed. – Øyvind Bråthen Sep 13 '11 at 07:13
  • @Øyvind Knobloch-Bråthen read this document http://msdn.microsoft.com/en-us/library/ms189786.aspx note is written over there – Pranay Rana Sep 13 '11 at 07:15
  • @Pranay - Question was regarding GUID in .NET not sequential GUID in SQL. – Øyvind Bråthen Sep 13 '11 at 07:37
5

You may want to take a look at these articles:

http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html
http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html

Personally I use integers if I don't need to have primary keys to be unique across several tables and databases. I find it simpler to debug with 87 than 2A734AE4-E0EF-4D77-9F84-51A8365AC5A0.

Pranay Rana
  • 175,020
  • 35
  • 237
  • 263
Otiel
  • 18,404
  • 16
  • 78
  • 126
4

Yes, there can be a duplicate but it wont. The GUID is a 32 char long and each char can be 0-F (hexadecimal). That means 16^32 possibilities.

So if you generate 1 000 000 GUIDs every second for 10 years, the chance that you create a duplicate is around 1 / 1079028307080601418897053.

In my opinion a GUID is a very good primary key candidate as you can generate if from anywhere without first checking if it already exist in the database.

Øyvind Bråthen
  • 59,338
  • 27
  • 124
  • 151
  • 1
    small point: probability calculation is wrong here. Have you ever heard about birthday paradox? :) – default locale Sep 13 '11 at 07:10
  • 1
    And please, don't measure GUID in chars! They are 128 bit long, not 32 0-F char long. I know it's the same, but it ISN'T the same. It's like if I measured your height using my feet.... Wooops :-) – xanatos Sep 13 '11 at 07:13
  • @xanatos - Agree with you there. Maybe not the best way to explain it since it actually makes it sound like the GUID is in fact a string. But it makes the math a bit easier to follow ;) – Øyvind Bråthen Sep 13 '11 at 07:15
  • @MAKKAM - I don't see how they are related. For birthday paradox there are 365 possibilities, and while this would apply to other random distribution cases, the GUID has so many combinations that you will never even exhaust a single percent of the possible combinations. – Øyvind Bråthen Sep 13 '11 at 07:19
  • 1
    Your probability calculation underestimates the likelihood of a collision for two reasons: 1. You are assuming the distribution of GUID values is truly random, which it isn't. 2. Google for "Birthday Paradox". – nvogel Sep 13 '11 at 07:21
  • 1
    @Øyvind I agree with conclusions you've made. My comment was just about the math. You can't figure out probability just by dividing the number of experiments to the number of possibilities. Simple example: if you generate random bit twice the chance that you create a duplicate is not 100%. – default locale Sep 13 '11 at 07:31
3

Thanks to the Birthday Paradox (Problem) you have around 50% of finding a duplicate if you generate 2^64 GUID... Are you happy? (this is because a fully random GUID is long 128 bits, so there are 2^128 different GUID. The birthday paradox tells us that if you have aproximatevely sqrt(2^128) GUID you have a 50% chance of a collision I say fully random GUID because there are some standard type of GUID where some digits are fixed. But .NET doesn't use these standards (read here http://en.wikipedia.org/wiki/Globally_unique_identifier) )

I'll add that if you problem is a problem of "speed" of the db, you should read this:

Improving performance of cluster index GUID primary key

Community
  • 1
  • 1
xanatos
  • 109,618
  • 12
  • 197
  • 280
  • 50%? Would really like to know how you came to that conclusion. How did you calculate? – Øyvind Bråthen Sep 13 '11 at 07:20
  • @Øyvind It's explained in the wiki. It's very funny. Take a classroom of 23 persons and there is 50% that two will share the same birthday. While 23 isn't sqrt(365), for bigger numbers the approximation is quite good (and sqrt(2^128) == 2^64, because (2^64)^2 == 2^(64*2) == 2^128) – xanatos Sep 13 '11 at 07:22
  • It depends. GUID generation algorithms use MAC address to generate the key. It may influence the probability of GUID duplication on the hardware unit. – default locale Sep 13 '11 at 07:23
  • @MAKKAM it's even more complex as I've written. TRUE Guid have some fixed digits. I think the .NET GUID are fully random. – xanatos Sep 13 '11 at 07:24
  • @xanatos - I see you upadted your answer to 50% **if** you generate 2^64 GUIDs. I would like to bet that you will never generate 2^64 GUIDs, and therefore be a long way from 50% ;) – Øyvind Bråthen Sep 13 '11 at 07:28
  • @Øyvind I haven't update the **if** part. I have updated another part. You can check the history. But yes, clearly the 2^64 GUID is a very big target :-) There is a reason they are called G(lobal) U(nique) ID :-) But I wouldn't use them to number the grain of sand of the beaches of the world (http://wiki.answers.com/Q/How_many_grains_of_sand_are_there_in_the_world) (around 2^62.7) :-) – xanatos Sep 13 '11 at 07:33
2

For the most part you can assume they will never duplicate, If your ID in a table is set to be the Primary Key, then inserting a duplicate will error anyway.

An advantage is using these ID's in a web application is that users cant just test URLS with other IDs so in theory would bemore secure (although you should have server validation for permissions anyway)

Mark Redman
  • 24,079
  • 20
  • 92
  • 147
0

Guids are statistically very highly likely to be unique and therefore are good candidates for primary keys if various systems are generating IDs and all of these need to be combined.

Eg. Working in an offline mode style and pushing back the change to the central db.

csmith
  • 99
  • 7