82

Is it possible to generate (highly probable) unique Integer from GUIDs?

int i = Guid.NewGuid().GetHashCode();

int j = BitConverter.ToInt32(Guid.NewGuid().ToByteArray(), 0);

Which one is better?

anonim
  • 2,494
  • 6
  • 30
  • 40
  • 1
    Why not just use the GUIDs for whatever purpose you'd be using the 32-bit integers? – JAB May 27 '10 at 14:24
  • 11
    Unique over what domain? I just wrote and executed a program that generates *all* the 32 bit integers, so you're not going to be able to generate one that I haven't already! – Eric Lippert May 29 '10 at 04:31
  • 11
    if you can forget `Guid`, then the best way of getting "unique" (100%) is just to have an int variable somewhere and do int++. You are sure to get 2^32 unique values and that's pretty large space too.. – nawfal Mar 31 '13 at 11:52
  • To answer, one would need some context. Importantly, and as emphasised in the [Microsoft docs](https://learn.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=net-7.0), we should never store the result of GetHashCode or expect them to be unique, especially cross domain, process and platform. It is non-deterministic in this sense. – DvS Mar 02 '23 at 16:33

10 Answers10

66

Eric Lippert did a very interesting (as always) post about the probability of hash collisions.

You should read it all but he concluded with this very illustrative graphic:

Probability of hash collisions

Related to your specific question, I would also go with GetHashCode since collisions will be unavoidable either way.

Michael Ryan
  • 479
  • 3
  • 14
João Angelo
  • 56,552
  • 12
  • 145
  • 147
24

The GetHashCode function is specifically designed to create a well distributed range of integers with a low probability of collision, so for this use case is likely to be the best you can do.

But, as I'm sure you're aware, hashing 128 bits of information into 32 bits of information throws away a lot of data, so there will almost certainly be collisions if you have a sufficiently large number of GUIDs.

Greg Beech
  • 133,383
  • 43
  • 204
  • 250
  • 6
    You can remove the "almost" if you take "sufficiently large" to be greater than 2^32. In that case, collision is guaranteed. – phoog May 29 '15 at 22:07
21

A GUID is a 128 bit integer (its just in hex rather than base 10). With .NET 4 use http://msdn.microsoft.com/en-us/library/dd268285%28v=VS.100%29.aspx like so:

// Turn a GUID into a string and strip out the '-' characters.
BigInteger huge = BigInteger.Parse(modifiedGuidString, NumberStyles.AllowHexSpecifier)

If you don't have .NET 4 you can look at IntX or Solver Foundation.

Justin
  • 4,097
  • 1
  • 22
  • 31
17

Here is the simplest way:

Guid guid = Guid.NewGuid();
Random random = new Random();
int i = random.Next();

You'll notice that guid is not actually used here, mainly because there would be no point in using it. Microsoft's GUID algorithm does not use the computer's MAC address any more - GUID's are actually generated using a pseudo-random generator (based on time values), so if you want a random integer it makes more sense to use the Random class for this.

Update: actually, using a GUID to generate an int would probably be worse than just using Random ("worse" in the sense that this would be more likely to generate collisions). This is because not all 128 bits in a GUID are random. Ideally, you would want to exclude the non-varying bits from a hashing function, although it would be a lot easier to just generate a random number, as I think I mentioned before. :)

MusiGenesis
  • 74,184
  • 40
  • 190
  • 334
  • 12
    Perhaps there's a good reason why the OP wants to derive the integer from a GUID. – LukeH May 27 '10 at 11:58
  • 4
    The OP may *think* there's a good reason to derive an integer from a GUID (namely, in order to ensure uniqueness of the int), but there really isn't. – MusiGenesis May 27 '10 at 12:07
  • 1
    The chance of collision is high with a 32 bit integer as shown in the link of João Angelo's answer. 9300 random 32 bit integers have collision chance of 1% and 77000 have a collision chance of 50%. Hence relying on 32 bit random numbers for uniqueness is an oxymoron. – Justin May 27 '10 at 12:19
  • 1
    Justin is right on the money. I've encountered this "anti-pattern" in the wild many times - a developer thinks that an int generated from a GUID will be *as unique* as the original GUID. However, simple math shows that for each int there are actually 79228162514264337593543950336 GUIDs (2 ^ 96) that will "map" to that particular int, which cuts down on the uniqueness a bit. – MusiGenesis May 27 '10 at 12:26
  • 4
    Just for the sake of readability: 79,228,162,514,264,337,593,543,950,336. – phoog May 29 '15 at 22:09
  • 4
    Just to add my 2 cents....all our keys are GUID's and a webservice we are calling requires the key be integer. And thus a reason for going from GUID to integer. – SASS_Shooter Nov 22 '16 at 20:44
  • So, you create a Guid then a random integer and then you feel compelled to point out the obvious (the GUID is not used). I'm dying to know what you see as an "answer" or even a useful example to the question. – Rick O'Shea Oct 02 '17 at 16:12
  • This is not repeatable. You might want to retrieve the same int multiple times for a given GUID – StayOnTarget Oct 15 '20 at 19:12
12

If you are looking to break through the 2^32 barrier then try this method:

/// <summary>
/// Generate a BigInteger given a Guid. Returns a number from 0 to 2^128
/// 0 to 340,282,366,920,938,463,463,374,607,431,768,211,456
/// </summary>
    public BigInteger GuidToBigInteger(Guid guid)
    {
        BigInteger l_retval = 0;
        byte[] ba = guid.ToByteArray();
        int i = ba.Count();
        foreach (byte b in ba)
        {
            l_retval += b * BigInteger.Pow(256, --i);
        }
        return l_retval;
    }

The universe will decay to a cold and dark expanse before you experience a collision.

s2dbaker
  • 121
  • 1
  • 3
  • Yea, but sometimes the universe collapses faster than one might expect. For example, when you mistake an int32 for a place where you can put an int64 - the universe is infinite (potentially) - an int32 is not. And the bytes in Guid are not ordered by significance from left to right, which is why this is still wrong, even if the universe doesn't collapse. – Stefan Steiger Mar 15 '18 at 08:00
10

I had a requirement where multiple instances of a console application needed to get an unique integer ID. It is used to identify the instance and assigned at startup. Because the .exe is started by hands, I settled on a solution using the ticks of the start time.

My reasoning was that it would be nearly impossible for the user to start two .exe in the same millisecond. This behavior is deterministic: if you have a collision, you know that the problem was that two instances were started at the same time. Methods depending on hashcode, GUID or random numbers might fail in unpredictable ways.

I set the date to 0001-01-01, add the current time and divide the ticks by 10000 (because I don't set the microseconds) to get a number that is small enough to fit into an integer.

 var now = DateTime.Now;
 var zeroDate = DateTime.MinValue.AddHours(now.Hour).AddMinutes(now.Minute).AddSeconds(now.Second).AddMilliseconds(now.Millisecond);
 int uniqueId = (int)(zeroDate.Ticks / 10000);

EDIT: There are some caveats. To make collisions unlikely, make sure that:

  • The instances are started manually (more than one millisecond apart)
  • The ID is generated once per instance, at startup
  • The ID must only be unique in regard to other instances that are currently running
  • Only a small number of IDs will ever be needed
Georg Patscheider
  • 9,357
  • 1
  • 26
  • 36
  • This works great for Int value ID's - the one change I made was to use DateTime.UtcNow instead of DateTime.Now, which uses the machines local time. – neoRiley Feb 05 '21 at 17:52
6

Because the GUID space is larger than the number of 32-bit integers, you're guaranteed to have collisions if you have enough GUIDs. Given that you understand that and are prepared to deal with collisions, however rare, GetHashCode() is designed for exactly this purpose and should be preferred.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
2

Maybe not integers but small unique keys, anyway shorter then guids:

http://www.codeproject.com/Articles/14403/Generating-Unique-Keys-in-Net

Tratak
  • 138
  • 1
  • 9
1

In a static class, keep a static const integer, then add 1 to it before every single access (using a public get property). This will ensure you cycle the whole int range before you get a non-unique value.

    /// <summary>
    /// The command id to use. This is a thread-safe id, that is unique over the lifetime of the process. It changes
    /// at each access.
    /// </summary>
    internal static int NextCommandId
    {
        get
        {
            return _nextCommandId++;
        }
    }       
    private static int _nextCommandId = 0;

This will produce a unique integer value within a running process. Since you do not explicitly define how unique your integer should be, this will probably fit.

Marcel
  • 15,039
  • 20
  • 92
  • 150
  • 4
    ...until the next time you open app. Or your webserver restarts. Or you're running a distributed application (multiple servers). /obvious (as I'm sure you knew) – drzaus Mar 31 '16 at 14:30
  • @drzaus Yes, it's only unique within a running process, as stated. – Marcel Oct 21 '16 at 09:35
1

Here is the simplest solution, just call GetHashCode() on the Guid. Note, that a guid is a 128 bit int while a int is 32. So its not guaranteed to be unique. But its probably statistically good enough for most implementations.

    public override bool Equals(object obj)
    {
        if (obj is IBase)
            return ((IBase)obj).Id == this.Id;

        return base.Equals(obj);
    }
    public override int GetHashCode()
    {
        if (this.Id == Guid.Empty)
            return base.GetHashCode();

        return this.Id.GetHashCode();
    }
Juls
  • 658
  • 6
  • 15