155

I'm working on a database in SQL Server 2000 that uses a GUID for each user that uses the app it's tied to. Somehow, two users ended up with the same GUID. I know that microsoft uses an algorithm to generate a random GUID that has an extremely low chance of causing collisons, but is a collision still possible?

nietonfir
  • 4,797
  • 6
  • 31
  • 43
Jason Baker
  • 192,085
  • 135
  • 376
  • 510
  • 14
    Everybody saying no is wrong.I already have collided 1 UniqueIdentifier with a dataset of less than half a million records, MSSQL 2008 R2 – Behrooz Jul 30 '12 at 12:47
  • 4
    @Behrooz Yikes. It isn't impossible thanks to our friend the birthday paradox, but it's still insanely unlucky with fully random v4 GUIDs. Maybe you were using a weaker GUID generation strategy? – Craig Ringer Oct 30 '12 at 07:57
  • @CraigRinger I was using the NEWID() function. – Behrooz Nov 03 '12 at 18:25
  • @Behrooz The documentation doesn't say what UUID generation scheme that uses - v1, v2, v3 or (preferably) v4. http://en.wikipedia.org/wiki/Universally_unique_identifier . If it's v1 IDs and you're generating lots of them in different database sessions on the same machine it'd be much easier to get a collision than with v4 UUIDs. Do the UUIDs follow the form `xxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx` with a `4` in the 13th digit? – Craig Ringer Nov 04 '12 at 02:26
  • 1
    @CraigRinger Version 4, half a dozen different machines, normal windows application use.It was almost a year ago. But I'm certain it happened and made me review code for days. – Behrooz Nov 06 '12 at 06:44
  • 7
    @Behrooz Wow. That's shocking luck. – Craig Ringer Nov 06 '12 at 07:42
  • 9
    @Behrooz this is probably a defective pseudo random number used in MSSQL (I wouldn't be surprised if they have 32-bit seed in their generator or the like given the quality of their software). The math doesn't lie. This possibility is so small so that you can be 99.9999999999(and a lot of 9 after)% that either MSSQL guid generator is defective (or may be pseudo random generator which is used to generate GUIDs) or you made a mistake. – Alex Dec 12 '14 at 20:05
  • @Behrooz, I basically just assume that everyone who says what you just said is wrong. While there's a chance that I'm wrong, I know that my opinion will be justified 99.9999...% of the time. This entry in the daily WTF is highly relevant: http://thedailywtf.com/articles/A-More-Unique-Identifier – riwalk Jan 13 '16 at 16:58
  • @Stargazer712 well, that code and application is long dead. but if you really want to know. It either used the mssql version or the .net Guid.NewGuid, don't remember which and the duplicate rows were at least 1 year appart.so, it's very unlikely that it's something with the RNG. the only thing that i can guess is memory corruption which is again unlikely in a .net application that used no unsafe code. – Behrooz Jan 15 '16 at 20:40
  • @Behrooz, ah I see. In between the theoretical beauty of GUID's and their use in the wild is the actual implementation, which can always have a flaw :) – riwalk Jan 19 '16 at 00:53
  • I think the question is tricky, I would rather ask "What is the probability for a GUID collision to happen", or something similar. This would have prevented the philosophical discussion of what "possible" actually means, which to me is were the disagreement lies in the different answers posted here. – jarmanso7 Sep 04 '19 at 10:53
  • 5
    Love how at this exact moment, both the question and the selected answer have 128 score. Coincidence? – Caio Cunha Jun 10 '20 at 21:59
  • @CaioCunha Yes. – Phillip Copley Sep 17 '20 at 19:39
  • @CaioCunha Yikes! now they are both 135!! – Terence Mar 04 '21 at 04:08

19 Answers19

155

Basically, no. I think someone went mucking with your database. Depending on the version GUID you're using the value is either unique (for things like version 1 GUIDs), or both unique and unpredictable (for things like version 4 GUIDs). SQL Server's implementation for their NEWID() function appears to use a 128-bit random number, so you're not going to get a collision.

For a 1% chance of collision, you'd need to generate about 2,600,000,000,000,000,000 GUIDs.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Tom Ritter
  • 99,986
  • 30
  • 138
  • 174
  • 3
    That's what I figured, but I just wanted to make sure I couldn't rule that out. You never know what kinds of weird bugs might pop up in 8 year old software. :) – Jason Baker Oct 08 '08 at 21:02
  • Maybe in SQL Server 2000 they are not possible, but generally they are, as there are for ways of creating GUIDs according to standard and one way uses a random number generator (except for a few bytes that are hard coded). Votes--; – Mecki Oct 08 '08 at 22:37
  • 7
    Actually that isn't true any more. It was true for v1 GUIDs, but not for the current v4 ones. See http://en.wikipedia.org/wiki/Globally_Unique_Identifier#Algorithm for more info. – Greg Beech Oct 08 '08 at 23:10
  • Greg is right, unless you use UuidCreateSequential that has not been true since ~2000 or 2001... – KristoferA Oct 15 '09 at 00:55
  • -1 what about cases where you have a farm of 100s of servers generating millions of GUIDs but not enough variance in the MS clock to gain entropy so your probability range tightens? – FlavorScape Oct 03 '12 at 18:47
  • 138
    Down vote because, in principle (in it's rawest form), you are wrong saying "no" to the question "Are GUID collisions possible?". It's very possible. The likelihood of which is tiny, but it's possible. I hate to sound pedantic - but SO is all about being concise and accurate. –  Oct 09 '12 at 16:08
  • 'spose it would depend on if your v4 UUIDs are issued bits with cryptographic quality random numbers. otherwise pseudo random based on clock time may encounter collisions during simultaneous production (accross many server origins at once). This guy might explain: http://html5devconf.com/sessions.html#e_elliott – FlavorScape Oct 23 '12 at 01:12
  • @FlavorScape said "what about cases where you have a farm of 100s of servers". OK, then you increase the chance of it happening. There's no indication that's what the poster is doing though. – Stealth Rabbi Mar 06 '14 at 17:08
  • 17
    enter "solve[1-exp[-(n^2/(2*2^128))] > 0.01, n]" into wolfram alpha to get the result for 1%... Beaware that while this number seems large in context of ONE application, it certainly isn't large for the entire world. If every computer on the earth would genereate true GUIDs, they would cause a collision with 1% probability within about one second, assuming they can generate a GUID each nanosecond (which is probably quite realistic these days). So if you use GUIDs for your database IDs, then they are unique. GUIDs for every computation done on earth, will collide immediately. – thesaint Nov 14 '14 at 10:03
  • so from my perspective they have a VERY bad name: They are LOCALLY Unique identifiers... Globally they are anything but unique. – thesaint Nov 14 '14 at 10:08
  • I won the lottery today. Generated an exact same GUID as an existing one! The chance is there, but sure it's small. – Bauss Jan 14 '16 at 10:16
  • 16
    Saying 'No' its not possible, and then saying that there is a 1% chance of getting a collision when a certain amount is generated, are direct conflicts. The correct response should be Theoretically - yes, a collision could happen randomly. However, the chances of a collision are statistically smaller than an asteroid hitting the Earth, bouncing off the Earth and rebounding off the Moon to hit Earth a second time, in the next hour. – Baaleos Jan 20 '16 at 13:25
  • 5
    @JᴀʏMᴇᴇ (and other similar comments), this answer doesn't say "No"; it says "**Basically**, no". IMHO, the word "basically" clearly means: "Yes, it's theoretically possible, but practically it isn't. If you have a duplicate GUID, then look elsewhere for a cause as it didn't happen randomly." – TTT Feb 23 '18 at 20:10
  • It uses some bytes from network card so a little less than that but still extremely rare chance of collision. – paparazzo Apr 20 '18 at 14:25
  • 1
    @JᴀʏMᴇᴇ I also took basically no as in practice no but not absolute no. – paparazzo Apr 20 '18 at 14:27
  • @paparazzo - it's still not accurate, though. The fact there's even a discussion of how it's being interpreted differently says enough. –  Apr 20 '18 at 15:48
  • 1
    @JᴀʏMᴇᴇ You can find a discussion on about anything. I was clear to me. – paparazzo Apr 20 '18 at 15:50
  • 1
    @paparazzo There's no discussion to be had with regard to the clarity of "GUIDs can collide". It's a fact. "No" is not the correct answer, end of. –  Apr 23 '18 at 09:36
  • 2
    @JᴀʏMᴇᴇ Take to english.stackexchange. Was clear to me. – paparazzo Apr 23 '18 at 10:53
  • @paparazzo - refer to the votes on my original comment, if you must. Interesting that you found 'no' to "clearly" mean 'probably no, but actually yes'. It's all over the place, easier to be factually clear and concise. That's what this site is about. You should read the guidelines if you feel you don't understand the value of clarity here. –  Apr 23 '18 at 13:28
  • 1
    @JᴀʏMᴇᴇ You keep ignoring the "basically", I was very clear with you. I also took "basically no" as in practice no but not absolute no. Tom goes on to give statistics of duplicate. The answer is crystal clear it can happen. – paparazzo Apr 23 '18 at 13:33
  • 1
    @paparazzo I, and many other people, find "basically no" a less than adequate way of expressing, clearly, "yes". This site depends on clarity, I shouldn't have to keep making this point. "Basically" can be subjective, congratulations - you managed to interpret it in the way the OP intended, some people may not, since it lacks utmost clarity. "Basically yes", in that case - since 'basically' provides a bit of lee-way, would've been better. And what english.stackexchange.com would say is irrelevant since SO is not for expressing things ambiguously, or rather - inaccurately. –  Apr 24 '18 at 07:59
  • @JᴀʏMᴇᴇ I am not telling you how to interpret it It is and was clear to ME. – paparazzo Apr 24 '18 at 08:09
  • @paparazzo - I'm happy for you. It may not have been clear to everyone else, if we agree on that I'm not sure why you're continuing the debate. –  Apr 24 '18 at 08:55
  • @paparazzo "basically no" != "yes" –  Apr 24 '18 at 09:08
  • 1
    @JᴀʏMᴇᴇ I never debated it with you. I only said how I interpreted it. Read my comments. – paparazzo Apr 24 '18 at 09:32
  • @paparazzo "basically no" != "yes". Conversation over. Can't be bothered continuing. –  Apr 24 '18 at 10:16
  • 1
    I know I'm quite late to this highly amusing debate, but I [looked up](https://www.google.com/search?q=basically&rlz=1C1GCEU_enGB820GB821&oq=basically&aqs=chrome..69i57j0l5.6478j1j7&sourceid=chrome&ie=UTF-8) "basically" on google and it says: "**used to indicate that a statement summarizes the most important aspects, or gives a roughly accurate account, of a more complex situation**". So yes, "basically no" is not the same as "yes", because it means "basically no". I trust that clears everything up. :-) Basically, the answer is right. ;-) – bornfromanegg Jan 16 '19 at 16:39
  • two words quantum computers. – Abdi Getachew Dec 19 '19 at 17:47
  • So the chances of two truly random guids (or random 128 bit numbers) colliding is "Basically No". But what about the source of the "randomness"? I mean if two devices creating the Guids started off with the same seed to the random number generator, then the first Guid created by each would end up the same. So surely the answer should include the chances of the seeds colliding. – komodosp Apr 01 '21 at 08:12
  • This isn't right. I have had one GUID collision in my IT career so far and another guy I know has had one. Mine was on a central government collection system which accepted data from multiple government agencies tracking every persons transactions with the agencies within a country over all time (trillions of records). 2 separate systems from 2 different orgs generated the same GUID. Another person I know had one while working for a global finance company in their transactions IDs (distributed generation of the IDs too). It's very possible and does happen, but it's very rare. – blobbles May 18 '21 at 20:52
  • 1
    In addition, it's much more likely to happen in the future as we get more IOT devices generating random GUIDs. We will end up with centralised collection systems that consolidate the data (imagine for instance, from micro controllers in all smart houses in the world, generating telemetry/heartbeat GUIDs every second) and consolidating that info. If you had 100 IOT controllers per house, over 300m houses, you are already at 2.5 quadrillion per day. Per year: 9.4608e+17, you hit 1% chance after 3 years. – blobbles May 18 '21 at 21:07
  • @blobbles Everyone here has said it is a possible, non zero chance of it happening. If it is critically important that you have a unique value, don't leave it to chance. This is particularly true for very large datasets. If you already have a trillion guids, every new one you generate has a chance to match the existing trillion, so why in the world would a large govt agency collecting large datasets from multiple systems not have flag that as a distinct possibility? – Daryl Young Nov 08 '22 at 15:10
  • @blobbles The answer doesn't say this explicitly, but is otherwise fine (someone noted the definition of "basically"). He's saying, at 1 million records within a single system, the chance of a collision are incredibly low, so you should consider the possibility that someone or something has altered the record. – Daryl Young Nov 08 '22 at 15:19
  • @DarylYoung - we ended up creating our own prefixes for the source systems to fix the issue, basically impossible that people had altered the records to match when we investigated (opposite ends of the country in two unrelated systems). The OP never said how many records are in their system, so making a definition of what "basically" means is rather pointless IMO. – blobbles Nov 08 '22 at 20:40
  • I am looking at a GUID collision right now. So yes, they do happen. Guess we were incredibly "lucky" – rxantos Jan 18 '23 at 03:44
121

Basically they are not possible !, the chances are astronomically low.

But... I'm the only person I the world that I know of, that had a GUID colision once (yep!).

And I'm sure of it, and that it wasn't a mistake.

How did it happen, in a small application that was running on Pocket PC, at the end of an operation a command that has an generated GUID must be issued. The command after it was executed on the server it was stored in a command table on the server along with the execution date. One day when I was debugging I issued the module command (with the newly generated GUID attached) and nothing happened. I did it again (with the same guid, because the guid was generated only once at the beginning of the operation), and again, and nothing, finally trying to find out why the command isn't executing, I checked the command table, and the same GUID as the current one was inserted 3 weeks ago. Not believing this, I restored a database from 2 weeks backup, and the guid was there. Checked the code, the new guid was freshly generated no doubt about it. Pow guid collision, happened only once, but I really wish I would have won at lotto instead,the chance is greater :).

Edit: there are some factors that could have greatly increased the chance of this happening, the application was running on the PocketPC emulator, and the emulator has a save state feature, which means that every time the state is restored the local time is restored also and the guid is based on on the internal timer....also the guid generating algorithm for compact framework might be less complete than for example the COM one...

Pop Catalin
  • 61,751
  • 23
  • 87
  • 115
  • 1
    Why the downvote, this is a true story :(, I even told it to the local developers group. – Pop Catalin Oct 08 '08 at 21:31
  • IMO, you made a mistake somewhere. The odds of a collision are so low as to be insignificant. – Mitch Wheat Nov 07 '08 at 14:18
  • @Mich that's a possibility too, but, I've checked coudn't find the mistake anywhere. – Pop Catalin Nov 08 '08 at 22:25
  • 43
    Upvoted. Save state & replay really would generated duplicate guids. – Joshua Dec 24 '08 at 20:42
  • 42
    Likely what happened was this was a "bad" GUID implementation. The _theoretical_ odds were very low, but on Pocket PC?? Who is to say that they didn't take a shortcut that bumped those odds up into the "unlikely, but possible" category. – Dave Dopson Mar 09 '12 at 01:47
  • 11
    Just because something has a very low probability to happen doesn't mean it won't happen. – Geeky Guy Sep 17 '13 at 19:45
  • 1
    that happened to me too man, and also to my friend and to his friend, what are the odds – shiggity Sep 25 '14 at 21:04
  • 3
    As I said above the chances of that are so increasingly small that it is safe to assume that either you made a mistake or MSSQL uses a defective PRNG (http://en.wikipedia.org/wiki/Pseudorandom_number_generator). E.g. it is likely that this PRNG is initalized with a seed of small size. Defective PRNGs are not rare (see https://www.schneier.com/paper-prngs.html) - for example one defect was recently discovered in Android SDK - http://android-developers.blogspot.com/2013/08/some-securerandom-thoughts.html + https://www.usenix.org/conference/woot14/workshop-program/presentation/kaplan – Alex Dec 12 '14 at 20:10
  • 3
    @Alex, the mistake was "Save State and Restore" from the Emulator, which restore the entire emulator image including the emulator clock. So after thousands of Restore operations over one year, one guid collision was generated. You are right there was a mistake! – Pop Catalin Dec 15 '14 at 08:55
  • 1
    Good of you to add this anecdote but your actual answer seems to confuse *possible* with *probable*. Collisions are perfectly possible, after all you can calculate the odds of one happening. They are just so improbable that it approaches impossibility in any real-world scenario. It's precisely this confusion that leads to this question being asked so often so consider updating your answer to satisfy us pedants. :) – Lilienthal Jun 15 '15 at 16:03
  • @Lilienthal, I've said "(basically) they are not possible" which means "to simplify the argument as much as possible" ... yes, the correct term is "improbable" ... but "basically impossible" means chances are too low to be considered for any practical purpose. – Pop Catalin Sep 12 '17 at 10:20
47

Are you a mathematician? Then yes.

Are you an engineer? Then no.

Eneroth3
  • 919
  • 8
  • 13
40

They are theoretically possible, but with 3.4E38 possible numbers, if you create tens of trillions of GUIDs in a year the chance of having one duplicate is 0.00000000006 (Source).

If two users ended up with the same GUID, I would wager that there is a bug in the program which is causing the data to be copied or shared.

Ben Hoffstein
  • 102,129
  • 8
  • 104
  • 120
  • "but with 3.4E38 possible numbers" - no. Two GUIDs generated almost simultaneously on the same machine would end up with extremely similar GUIDs. – Kirk Strauser Oct 08 '08 at 21:14
  • 4
    That would depend on how the GUID is generated, and some implementations based on the CPU time or milliseconds will (hopefully) exagerate whatever calculation its based off of so two GUID's generated from milliseconds apart will have a vast difference. –  Oct 31 '08 at 15:19
  • 4
    With more than 1 processor on a machine, if a guid is based on time and mac address then each core could issue the same guid at the same moment in time. – AndyM Feb 19 '10 at 16:53
  • 13
    I'm pretty sure any decent GUID implementation will not – Guillaume86 Jun 04 '12 at 15:18
  • I wonder how much the birthday paradox reduces the chances of any 2 guids colliding? – Matthew Lock Sep 10 '14 at 04:05
  • 2
    @MatthewLock The birthday paradox is covered in the source. Check the link. – Zero3 Jan 11 '16 at 09:02
  • @Guillaume86 one problem is that you are assuming a single implementation – symbiont Dec 28 '20 at 10:41
25

First lets look at the chance of collision of two GUIDs. It is not, as other answers have stated, 1 in 2^128 (10^38) because of the birthday paradox, which means that for a 50% chance of two GUIDs colliding the probability is actually 1 in 2^64 (10^19) which is a lot smaller. However, this is still a very large number, and as such the probability of collision assuming you are using a reasonable number of GUIDs is low.

Note also that GUIDs do not contain a timestamp or the MAC address as many people also seem to believe. This was true for v1 GUIDs but now v4 GUIDs are used, which are simply a pseudo-random number which means that possibility of collision is arguably higher because they are no longer unique to a time and a machine.

So essentially the answer is yes, collisions are possible. But they are highly unlikely.

Edit: fixed to say 2^64

Seph
  • 8,472
  • 10
  • 63
  • 94
Greg Beech
  • 133,383
  • 43
  • 204
  • 250
  • 2
    While I agree with all of your facts, be careful with your math. To say that you have a 1 in 10^19 chance of any two GUIDs colliding depends on how many GUIDs are in the set. For that chance you need ~2^32 GUIDs, so in nearly all real-world scenarios the odds are much lower. – DocMax Oct 09 '08 at 07:16
  • 3
    You have a typo of `1 in 10^64 (10^19)`, which I think should be `1 in 2^64 (10^19)`. I'm also very confused how you think the birthday paradox applies to just 2 numbers. I assume you looked at http://en.wikipedia.org/wiki/Birthday_paradox. The table shows how many guids you need for a given probability of a duplicate. From that table a probability of 1 in 10^18 requires 2.6 * 10^10 guids, not anything close to just two GUIDs. – Tony Lee Dec 29 '11 at 01:22
  • One point -- v1 guids are still in wide use, and rely upon MAC address, particularly in databases as they have desirable characteristics. See UuidCreateSequential and it's SQL Server wrapper NewSequentialID (http://msdn.microsoft.com/en-us/library/windows/desktop/aa379322(v=vs.85).aspx). – EBarr Jul 22 '14 at 22:01
18

The chances of two random GUIDs colliding (~1 in 10^38) is lower than the chance of not detecting a corrupt TCP/IP packet (~1 in 10^10). http://wwwse.inf.tu-dresden.de/data/courses/SE1/SE1-2004-lec12.pdf, page 11. This is also true of disk drives, cd drives, etc...

GUIDs are statistically unique and the data you read from the db is only statistically correct.

Tony Lee
  • 5,622
  • 1
  • 28
  • 45
  • Are you sure I couldn't possibly armor my network so less than 1 in 10^28 packets are corrupt? – Joshua Jun 10 '13 at 21:46
13

I would consider Occam's razor as a good guide in this case. It is incredibly unlikely that you have a GUID collision. It is much more likely you have a bug, or someone messing with your data.

Jason Jackson
  • 17,016
  • 8
  • 49
  • 74
  • 1
    Actually in this situation Occam's razor is not a good guide at all! Occam's Razor says that the case with the least assumptions is most likely to be correct. In this situation the case of GUID collision is actually much simpler, but Occam's Razor does not apply to a situation like this where we already know that one of the cases is incredibly unlikely. – lockstock Aug 14 '19 at 01:49
11

See Wikipedia's Globally Unique Identifier article. There are several ways to generate GUIDs. Apparently the old (?) way used Mac address, a timestamp down to a very short unit and a unique counter (to manage fast generations on the same computer), so making them duplicate is nearly impossible. But these GUIDs were dropped because they could be used to track down users...

I am not sure of the new algorithm used by Microsoft (the article says a sequence of GUIDs can be predicted, looks like they no longer use timestamp? The Microsoft article linked above says something else...).

Now, GUIDs are carefully designed to be, by name, globally unique, so I will risk it is impossible, or of very very very low probability. I would look elsewhere.

PhiLho
  • 40,535
  • 6
  • 96
  • 134
  • 6
    [Eric post 1](http://blogs.msdn.com/b/ericlippert/archive/2012/04/24/guid-guide-part-one.aspx) – Nat Jun 22 '12 at 02:32
  • 5
    [Eric post 2](http://blogs.msdn.com/b/ericlippert/archive/2012/04/30/guid-guide-part-two.aspx) – Nat Jun 22 '12 at 02:32
  • 5
    [Eric post 3](http://blogs.msdn.com/b/ericlippert/archive/2012/05/07/guid-guide-part-three.aspx) – Nat Jun 22 '12 at 02:33
9

Two Win95 machines that have ethernet cards with duplicate MAC addresses will issue duplicate GUIDS under tightly controlled conditions, especially if, for example, the power goes off in the building and they both boot at exactly the same time.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • Is it common for two different machines to have the same ethernet MAC address? – Dave Lucre Jun 27 '18 at 22:19
  • @DaveLucre: No, but incidents have been recorded. – Joshua Jun 27 '18 at 22:29
  • I'm really curious how this comes about. Is it more likely with VMs that randomly generate a MAC for each NIC? I've never heard of physical NICs being manufactured with duplicate MACs! Kind of throws a massive spanner in the works if that's possible! – Dave Lucre Jun 27 '18 at 23:20
  • Wow! Thanks for the link @Joshua! What a colossal screw-up! – Dave Lucre Jun 28 '18 at 02:28
  • @DaveLucre I've used some very cheap USB NICs where ALL of them are manufactured with the same MAC. But of course, that has nothing to do with the mathematics of randomness, and everything to do with the laziness of the manufacturer. – rudolfbyker Dec 03 '18 at 11:07
  • Wow, using the same MAC for all devices is more than just laziness @rudolfbyker, that's negligence. – Dave Lucre Dec 04 '18 at 20:37
6

I know people like the feel-good answer that GUIDs are magical and guaranteed to be unique, but in reality, most GUIDs are just 121-bit random numbers (seven of the bits are wasted on formatting). If you wouldn't feel comfortable using a big random number, then you shouldn't feel comfortable using a GUID.

Rick Yorgason
  • 1,616
  • 14
  • 22
  • 13
    Also recommend you don't use networks. Or computers. Parity bits can only do so much! – Rushyo Feb 03 '11 at 15:44
  • You misunderstood. There's two things I was trying to say in this post:1) If you need a big random number, use a big random number. Using a GUID as a big random number is needlessly misleading. (2) – Rick Yorgason Mar 18 '11 at 21:11
  • (2) GUIDs aren't magically unique, and you shouldn't use them unless you've thought about whether or not they're actually appropriate. Often a string, an incremented number, or a pair of IDs, may be more appropriate and more readable. People claim that GUIDs are perfect for distributed databases, but a smallint column identifying to server that created the row and a incremented ID work just as well, are easier to read, and use less space. – Rick Yorgason Mar 18 '11 at 21:22
  • 7
    Which I'm fully aware of. You stated "if you wouldn't feel comfortable using a big random number." but GUIDs are so unique that you'd find that pretty much everything else in a computer is more random, even operations you take for granted. There's more chance that a freak memory glitch will break your identity column than a (true) GUID collision will occur. You should not feel 'uncomfortable' about them. If they're not ideal for the scenario then fine - but they don't need special caution. – Rushyo Mar 22 '11 at 11:51
  • Are your GUIDs really "so unique"? More unique than a long random number? Check the 13th digit (first digit after the second dash): is it a 4? There's a good chance that it is, because this is the most common type of GUID nowadays < http://nitoprograms.blogspot.com/2010/11/few-words-on-guids.html >. If this is the case, then congratulations: you're using a big random number. All I'm saying is that using a GUID as a random-number-storage-device is silly, awkward, and just a little misleading. – Rick Yorgason May 23 '11 at 06:43
  • 5
    I guess this is going nowhere but what people are trying to explain to you is that error detections mecanisms in common hardware like network cards or hard drives use algorithms that have bigger chances of not detecting an error than you of getting a GUID collision, so if you rely on these, you could as well rely on GUIDs – Guillaume86 Jun 04 '12 at 15:28
  • Yes, the chance of a collision is very low. Just don't fool yourself into thinking that it's lower than a big random number. – Rick Yorgason Jul 07 '12 at 13:07
  • 1
    @Rick, depends how big your number is. Definitely not with a 4 byte int or 8 byte bigint. GUID=16 bytes, so you'd need a custom 16 byte big number implementation to achieve the same 2^128 possible combinations. So generally speaking, if using 'normal' int or bigint random numbers, the chance of collisions with a GUID *is* lower (leaving out random algo considerations for each). – Wim Oct 04 '12 at 10:43
  • @Wim: That is true, but most databases can already handle arbitrarily large numbers. SQL Server 2000 seems to be quite behind in this capacity, but its decimal type should still be big enough. – Rick Yorgason Jan 20 '13 at 06:01
5

I'll preface this with "I'm not a networking person, so I may make completely incoherent sentences following.".

When I worked at Illinois State University, we had two Dell desktops, ordered at different times. We put the first one on the network, but when we tried to put the second one on the network we started receiving crazy errors. After much troubleshooting, it was determined that both machines were producing the same GUID (I'm not sure exactly what for, but it made them both unusable on the network). Dell actually replaced both machines as defective.

John Kraft
  • 6,811
  • 4
  • 37
  • 53
  • 3
    It was specifically the GUID. It had something to do with the GUID generated by the machines when they joined the network. It took several weeks for Dell to replace the machines because they said it was impossible for the GUIDs to be the same. We were able to reproduce the problem, Dell took the machines back, and were able to produce the same results on their networks. They ended up replacing both machines. As I said, I'm not a networking person, but I specifically remember it was a problem with GUIDs. – John Kraft Oct 15 '09 at 01:46
5

Generalized formula

There's a formula that estimates how many values of size S to generate to get a collision between two of them with probability P.

Variables:

  • bits - how many bits in your data type.
  • probability - target probability for the collision.

To get a collision, you have to generate around:

2^{\frac{bits + 1}{2}} * \sqrt{-log_2(1 - probability)}

Or in Python:

from math import sqrt, log

def how_many(bits, probability):
    return 2 ** ((bits + 1) / 2) * sqrt(-log(1 - probability))

GUIDs

For GUIDs (128 bits), to get a collision with probability 1% (0.01), you'll need:

In [2]: how_many(bits=128, probability=0.01)
Out[2]: 2.6153210405530885e+18

...around 2.6 * 10^18 GUIDs (that's 42 exabytes of GUIDs).

Note that this probability grows rapidly. No matter the number of bits, for 99.99% probability you'll need only 30x more GUIDs than for 1%!

In [3]: how_many(bits=128, probability=0.9999)
Out[3]: 7.91721721556706e+19

Int64

Same numbers, but for the int64 datatype:

In [4]: how_many(bits=64, probability=0.01)
Out[4]: 608926881

In [5]: how_many(bits=64, probability=0.9999)
Out[5]: 18433707802

For 1% collision probability you'll need 5 gigabytes of int64-s. Still a lot but compared to the GUIDs that is a much more comprehensible number.


It's the so called birthday problem - and in this Wikipedia article you can find more precise estimation formulas than this one.

gukoff
  • 2,112
  • 3
  • 18
  • 30
3

Could the code used to generate a GUID have a bug in it? Yes, of course it could. But the answer is the same as it would be for a compiler bug - your own code is orders of magnitude more likely to be buggy, so look there first.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
2

Of course its possible....Probable? Not likely, but it is possible.

Remember, the same machine is generating every GUID (the server), so a lot of the "randomness" that is based on machine specific information is lost.

FlySwat
  • 172,459
  • 74
  • 246
  • 311
1

Just for grins, try the following script... (works on SQL 2005, not sure about 2000)

declare @table table
(
    column1 uniqueidentifier default (newid()),
    column2 int,
    column3 datetime default (getdate())
)

declare @counter int

set @counter = 1

while @counter <= 10000
begin
    insert into @table (column2) values (@counter)
    set @counter = @counter + 1
end

select * from @table

select * from @table t1 join @table t2 on t1.column1 = t2.column1 and t1.column2 != t2.column2

Running this repeatedly (takes less than a second) produces a fairly wide range from the first select, even with an EXTREMELY short time gap. So far the second select hasn't produced anything.

GalacticCowboy
  • 11,663
  • 2
  • 41
  • 66
  • 1
    You need another 15 zeros at the end of the counter to have a 50% chance of a duplicate. But, for Pete's sake don't do it! – Jim Birchall Oct 31 '08 at 16:22
0

Impossible if the users have different machines with network cards, and even if not it is still an extremely marginal almost theoretical risk.

Personally I'd look elsewhere as it is more likely a bug rather than a GUID clash...

Providing of course that you don't chop bits off the GUID to make it shorter.

Richard Harrison
  • 19,247
  • 4
  • 40
  • 67
0

It's highly unlikely that you'll run into GUID collisions if you're generating them through something like the NEWID() function in SQL Server (though of course possible, as other answers have emphasized). One thing they haven't pointed out is that it's actually quite likely that you'll run into collisions if you're generating GUIDs in JavaScript on browsers in the wild. Not only are there sometimes problems in the RNG in different browsers, but I've also run into problems where the Google spiders seem to cache the results of functions like that, and ended up repeatedly passing the same GUID up to our systems.

See the various answers here for more details:

Collisions when generating UUIDs in JavaScript?

Community
  • 1
  • 1
Ken Smith
  • 20,305
  • 15
  • 100
  • 147
0

Don’t worry about what it is. Make it impossible. Mix the improbability of GUID with the impossibility of sequential. Just add a database sequential I’d to the GUID and call it done. You may need to change the data type from GUID to String-ish but they are not that different storage wise.

WiredLessInTX
  • 113
  • 1
  • 6
-1

Sure it's possible, and maybe even likely. It's not like each GUID is in a random portion of the possible number space. In the event that two threads attempted to generate one simultaneously, barring some kind of centralized GUID function with a semaphore around it, they could end up with the same value.

Kirk Strauser
  • 30,189
  • 5
  • 49
  • 65