5

I understand in SQL Server GUIDs are MOSTLY unique, and also that the likelihood of a collision is remote, yet at the same time someone must win the lottery so I feel like it makes some sense to prepare for the possibility.

Which is faster/better practice

Using a technique where I assign a new GUID directly by just inserting a row and checking for an error (@@ERROR <> 0) and repeating until I don't get an error [which I suppose in theory would only at worst be once...]

or using an approach like this

DECLARE @MyGUID uniqueidentifier
SELECT @MyGUID = NewID()
if exists(select * from tablename where UserID=@MyGUID)

and looping over that till I find one not in use.

I like the 2nd approach because I can then have the GUID for use later on in the Stored Procedure so I'm currently leaning towards that one.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Jordan
  • 2,708
  • 4
  • 22
  • 35
  • 6
    If you win the lottery three weekends in a row, start worrying about potential GUID conflicts in your app ..... this chance is **soooooo** remote - you'd have to create millions of GUIDs every second for **centuries** before you run into a duplicate. Don't you have any more pressing things to worry about!?!?!? – marc_s May 10 '11 at 20:51

5 Answers5

7

If you have at least one network adapter in your computer, then your GUIDs will be unique. IF you don't have, then the possibility of colliding with a guid generated on another machine exists in theory, but is never ever going to happen to you. Writing code to guard against duplicate guids is a total waste of time.

That being said, to enforce uniqueness of anything in a relational database is done by only one means: create a unique constraint on the data:

ALTER TABLE tablename ADD CONSTRAINT uniqueUSerID UNIQUE UserID;
Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • 3
    To the best of my knowledge, the Windows API no longer uses the MAC address to create a GUID (it's basically just random now), and SQL Server uses a sequential algorithm to deal with index clustering issues. – MusiGenesis May 10 '11 at 21:02
  • yeah, Win2000+ have guid completely (pseudo)random. What Remus say about MAC address is NT4 only, but everything else he said is true :) – Jan 'splite' K. Jun 21 '13 at 13:11
5

The chances of a collision happening on your computer are much less likely than you winning the lottery.

Use NEWID() and don't worry about a hugely unlikely event.

If you declared the column as unique, it will not get persisted anyways.

Community
  • 1
  • 1
Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • To put it more strongly, the likelihood of a GUID collision is to the likelihood of winning the lottery as the likelihood of winning the lottery is to the likelihood of eating a sandwich this year. – MusiGenesis May 10 '11 at 21:09
  • 1
    @MusiGenesis - I would suggest that the probability of "eating a sandwich this year" is highly dependent on geography and culture and in some combinations be lower than "the likelihood of a GUID collision" ;) – Oded May 10 '11 at 21:11
  • 1
    Does a Big Mac count as a "sandwich"? Those guys are pretty damn ubiquitous by now. :) – MusiGenesis May 10 '11 at 21:13
  • @MusiGenesis - Dunno. However: http://en.wikipedia.org/wiki/List_of_countries_with_McDonalds_franchises#Former_locations – Oded May 10 '11 at 21:14
  • 3
    @Musigenesis: this almost reminds me of the *Improbability drive* in the Hitchhiker's Guide to the Galaxy :-) – marc_s May 10 '11 at 21:15
  • @marc_s: how can something *almost* remind you of something else? :) – MusiGenesis May 10 '11 at 21:18
  • 1
    @Musigenesis - Effects of the improbability drive on @marc_s – Oded May 10 '11 at 21:21
4

To actually answer the question and not debate the merit of the question/perceived problem.

The first implementation will be the one you want to use for two reasons:

  • Running a check exists before doing the insert for every single record your dealing with will in the end result in more resources being dedicated to something that is extremely unlikely to happen. (also the database will ensure the constraint on the column as well so if it does collide the data wont be committed)
  • If you have an error on the first one, you can take a little extra time and handle the ERROR that is returned.

You can combine the two and declare the new uniqueidentifier insert into the table (if it works continue using it, else retry with a new one and then continue using it).

Generally you want to program for the most likely situation first and then handle the exceptions last.

Kyro
  • 748
  • 1
  • 12
  • 28
3

The probability of a collision is so low - far less than that of winning the lottery - that it's more likely that cosmic rays will strike the RAM in the machine and cause your program to fail in some other, arbitrary manner. It's more likely that your error handling for this case will (now or eventually) contain an error that will lead to failure. Researchers who deliberately try to locate collisions using extensive computing power haven't succeeded so far in finding even one. I don't think it's worth your effort or time to handle this error case, at least until you've handled the wide range of far more likely hardware and software errors that may occur. I realize this is counterintuitive, but trust me.

Chiara Coetzee
  • 4,201
  • 1
  • 24
  • 20
  • It's not counterintuitive I really realize the chances of a collision are extraordinarily low. I was also using this as a thought to explore which is faster catching a rarely occuring error when it happens and dealing with it as such or doing a query before each insertion in a given situation to see if it exists. I have found at times that catching rarely occuring errors is faster than checking for them all the time. – Jordan May 10 '11 at 20:57
1

Generally, I only use a GUID primary key if I'm dealing with distributed databases where clients need to add new records while offline. If this isn't your scenario, you're better off using an autoincremented int for your primary key.

MusiGenesis
  • 74,184
  • 40
  • 190
  • 334
  • I know it's not the solution per se but security through obscurity. I am passing some userid info through ajax and therefore the user could see their ID, I'd prefer not to have people be able to guess other User IDs. I have measures to prevent them from being able to use them anyway but I'd like to use a GUID as an added measure. – Jordan May 10 '11 at 21:01
  • 1
    @Jordan: your system should require authentication by userid *and* password at all times. Being able to guess some other user's id should not enable them to access anything. Also, you'd be better off generating the GUIDs client-side rather than in SQL Server, since SQL Server's GUIDs are intentionally sequential (I don't know if the sequence is really guessable, though). – MusiGenesis May 10 '11 at 21:04
  • 1
    SQL Server GUID's generated by `newid()` are **not** sequential - those generated by `newsequentialid()` are - to a degree... – marc_s May 10 '11 at 21:09
  • 1
    @marc_s: you're expecting what I say to be accurate? Sheesh. – MusiGenesis May 10 '11 at 21:19