-4

I'm building an app. Assume it is a messaging app and becomes as popular as whatsapp.

A GUID will be given for every message sent in the world.

There will be a problem if any 2 GUIDs of in the world are equal.

As of today 30billion!(official) whatsapp messages are sent in the world in a day.

I'm using C# (Xamarin)'s System.Guid.NewGuid method to generate GUIDs.

What is the chance for a 'problem' occur because the random numbers are not truly random?

(This question is different from others because it describes a situation where millions of people get billions of new GUIDs combined each day.)

Aron
  • 15,464
  • 3
  • 31
  • 64
SadeepDarshana
  • 1,057
  • 18
  • 34
  • 1
    Possible duplicate of [What are the chances to get a Guid.NewGuid () duplicate?](http://stackoverflow.com/questions/8642858/what-are-the-chances-to-get-a-guid-newguid-duplicate) – MichaelMao Jul 23 '16 at 18:10
  • Please do not put in random tags like crypto. Also, please try to Google your subject matter before posting. Pretty much every definition of GUID on the internet would give you your answer. – Aron Jul 23 '16 at 18:31
  • 1
    PS 30billion is not a big number. GUIDs are sufficient to randomly assign to each atom in the universe uniquely. Please come back when you find a way to store a message in each atom in the universe... – Aron Jul 23 '16 at 18:35
  • @Aron I know 2^128 is a big number comparing to my requirement but my problem was whether it will work fine when we consider the fact that C# does't generate true random numbers. By the way 2^128 < 10^78. – SadeepDarshana Jul 23 '16 at 19:05
  • @SadeepDarshana no true GUID implementation uses the full 128 bit of entropy. But all (ignoring bugs, which do get patched) will cause you problems. Deep dive into .net's implementation, you get over 100bits of entropy in the latest version. The previous version used your Mac address as part of it. But even if you had two computers using different version, they wouldn't collide. – Aron Jul 23 '16 at 19:39
  • 1
    An implementation of GUID which can collide is an embarrassing bug that is of the highest importance. – Aron Jul 23 '16 at 19:40

2 Answers2

0

I like this passage from Wikipedia:

They may or may not be generated from random (or pseudo-random) numbers. GUIDs generated from random numbers normally contain 6 fixed bits (these indicate that the GUID is random) and 122 random bits; the total number of unique such GUIDs is 2122 (approximately 5.3×1036). This number is so large that the probability of the same number being generated randomly twice is negligible;[citation needed] however other GUID versions have different uniqueness properties and probabilities, ranging from guaranteed uniqueness to likely duplicates. Assuming uniform probability for simplicity, the probability of one duplicate would be about 50% if every person on earth as of 2014 owned 600 million GUIDs.

https://en.wikipedia.org/wiki/Globally_unique_identifier

If you are truly concerned you always have the option and ability to create a collision detection method. Such as if you detect that a GUID is already in use then just assign a new random GUID and iterate this until no duplicate is detected. Reminds me of hashtables so to speak. There are performance penalties for this, but just know there is a solution for your obstacle!

Update

I can understand your concern about randomness, but if you take into account it is a structured algorithm it will almost assign in a sequential chronological matter (kind of). There is a minimal concern pertaining to this, I would just be more concerned with the performance degradation surrounding using a 128-bit value for a primary key resolvement.

Also from Wikipedia:

GUIDs are commonly used as the primary key of database tables, and with that, often the table has a clustered index on that attribute. This presents a performance issue when inserting records because a fully random GUID means the record may need to be inserted anywhere within the table rather than merely appended near the end of it.

BolletuH
  • 124
  • 4
  • Your post seems to completely miss the design of a GUID. GUIDs are used when collision detection is not possible. There are no GUID algorithms that are monotonically increasing, because that isn't a GUID that is Timestamp. Timestamps can't be used in a system which supports AP in the CAP model, since they require C in the CAP model. Also, to put into perspective the problem at hand, YouTube is a distributed storage platform of video messages. Each video is given a random 64bit id. – Aron Jul 23 '16 at 18:29
0

If there are 30B messages every day, 30B GUIDs are generated. enter image description here
This is how many times you can generate a unique GUID. Even if your application stands strong for septillions of years, the chance of generating duplicate UUIDs is about 0.03225806451%. So please do not worry, even if it WILL happen, it will happen once, and you'll resolve it easily.

jam berry
  • 25
  • 5
  • 1
    *"the chance of generating duplicate UUIDs is about 0.00004408104%"* -- Could you confirm what is represented by this number precisely? Is it the probability of generating two equal `Guid`s with the `Guid.NewGuid` method, by an application that generates 30,000,000,000 `Guid`s per day and runs continuously for 10²⁴ Earth-years? – Theodor Zoulias Jun 08 '22 at 14:57
  • (2^128)/(30*(10^9)) is GUIDs per day, so you divide it by 365*(10^24) [full calc: ((2^128) / (30 * (10^9))) / (365*(10^24))] and actually get 31, which means that the chance is actually 1/31 (0.03225806451%). Thanks for noticing that my calculation was off, I apparently forgot to put brackets for (365*(10^24)) and my calculator messed that up. I'll edit my question now, but that number is still ominously low :) Unless my maths is wrong and you could correct it – jam berry Jun 08 '22 at 16:19
  • *"(2^128)/(30*(10^9)) is GUIDs per day"* -- I don't get it. Where does this 2^128 comes from? Could you include these calculations, and the reasoning behind them, in details inside the answer? – Theodor Zoulias Jun 08 '22 at 16:27
  • Jeez. There can be 2^128 different GUIDS, I hope you're capable of figuring that. 30*(10^9) is 30 billion - the amount of GUIDs per day (by OP). 365*(10^24) is 10^24 years but in days. So when we divide 2^128 by 30*(10^9), we know how many times we can generate 30*10^9 GUIDs that without running out of GUIDs - this number is also the number of days that can be spent generating those GUIDs. Divide this amount of days by 365*(10^24) days and get the amount of times that you can do the division, hence the probability that after repeating it once, 2 duplicates will be found. – jam berry Jun 08 '22 at 16:45
  • There are not in fact 2^128 valid guids. Some bits are taken up by specifying the algorithm, for example, so some of those 2^128 bits are technically representing invalid GUIDs per the specs, and Guid.NewGuid only uses one particular algorithm on any one version, so it actually has a lot less possible values that it will actually generate. Not enough to make collisions realistic in realistic situations, but it's still several orders of magnitude off from your calculations. – Servy Jun 08 '22 at 16:49
  • I got my information from a public source (about 2^128 GUIDs), and I was not aware that Guid.NewGuid for some reason decides to stripe some of them out, because, VERY intuitively, if it says New GUID, then it must create a GUID accordingly. Either way, I used septillions just to show how unlikely the event is to happen, because the application suggested by the OP will last at most 15 years, where the results are even more comically small. You are nitpicking on purposeful overexaggeration, because I, due to me using human logic, didn't think that a GUID generator generates a GUID differently. – jam berry Jun 08 '22 at 16:55
  • I would prefer if you included these calculations/explanations inside the question, instead of packing them in the comments. It's hard to read so dense mathematical formulas with a small font and lack of formatting. My assumption is that your calculations are suffering from the [birthday paradox](https://en.wikipedia.org/wiki/Birthday_problem), so the resulting percentage is probably many orders of magnitude off. Btw, unless the answer is updated, I won't comment any further here. – Theodor Zoulias Jun 08 '22 at 18:20
  • I hope you won't. – jam berry Jun 11 '22 at 16:50