Is it possible to generate (highly probable) unique Integer from GUIDs?
int i = Guid.NewGuid().GetHashCode();
int j = BitConverter.ToInt32(Guid.NewGuid().ToByteArray(), 0);
Which one is better?
Is it possible to generate (highly probable) unique Integer from GUIDs?
int i = Guid.NewGuid().GetHashCode();
int j = BitConverter.ToInt32(Guid.NewGuid().ToByteArray(), 0);
Which one is better?
Eric Lippert did a very interesting (as always) post about the probability of hash collisions.
You should read it all but he concluded with this very illustrative graphic:
Related to your specific question, I would also go with GetHashCode
since collisions will be unavoidable either way.
The GetHashCode
function is specifically designed to create a well distributed range of integers with a low probability of collision, so for this use case is likely to be the best you can do.
But, as I'm sure you're aware, hashing 128 bits of information into 32 bits of information throws away a lot of data, so there will almost certainly be collisions if you have a sufficiently large number of GUIDs.
A GUID is a 128 bit integer (its just in hex rather than base 10). With .NET 4 use http://msdn.microsoft.com/en-us/library/dd268285%28v=VS.100%29.aspx like so:
// Turn a GUID into a string and strip out the '-' characters.
BigInteger huge = BigInteger.Parse(modifiedGuidString, NumberStyles.AllowHexSpecifier)
If you don't have .NET 4 you can look at IntX or Solver Foundation.
Here is the simplest way:
Guid guid = Guid.NewGuid();
Random random = new Random();
int i = random.Next();
You'll notice that guid
is not actually used here, mainly because there would be no point in using it. Microsoft's GUID algorithm does not use the computer's MAC address any more - GUID's are actually generated using a pseudo-random generator (based on time values), so if you want a random integer it makes more sense to use the Random
class for this.
Update: actually, using a GUID to generate an int
would probably be worse than just using Random
("worse" in the sense that this would be more likely to generate collisions). This is because not all 128 bits in a GUID are random. Ideally, you would want to exclude the non-varying bits from a hashing function, although it would be a lot easier to just generate a random number, as I think I mentioned before. :)
If you are looking to break through the 2^32 barrier then try this method:
/// <summary>
/// Generate a BigInteger given a Guid. Returns a number from 0 to 2^128
/// 0 to 340,282,366,920,938,463,463,374,607,431,768,211,456
/// </summary>
public BigInteger GuidToBigInteger(Guid guid)
{
BigInteger l_retval = 0;
byte[] ba = guid.ToByteArray();
int i = ba.Count();
foreach (byte b in ba)
{
l_retval += b * BigInteger.Pow(256, --i);
}
return l_retval;
}
The universe will decay to a cold and dark expanse before you experience a collision.
I had a requirement where multiple instances of a console application needed to get an unique integer ID. It is used to identify the instance and assigned at startup. Because the .exe is started by hands, I settled on a solution using the ticks of the start time.
My reasoning was that it would be nearly impossible for the user to start two .exe in the same millisecond. This behavior is deterministic: if you have a collision, you know that the problem was that two instances were started at the same time. Methods depending on hashcode, GUID or random numbers might fail in unpredictable ways.
I set the date to 0001-01-01, add the current time and divide the ticks by 10000 (because I don't set the microseconds) to get a number that is small enough to fit into an integer.
var now = DateTime.Now;
var zeroDate = DateTime.MinValue.AddHours(now.Hour).AddMinutes(now.Minute).AddSeconds(now.Second).AddMilliseconds(now.Millisecond);
int uniqueId = (int)(zeroDate.Ticks / 10000);
EDIT: There are some caveats. To make collisions unlikely, make sure that:
Because the GUID space is larger than the number of 32-bit integers, you're guaranteed to have collisions if you have enough GUIDs. Given that you understand that and are prepared to deal with collisions, however rare, GetHashCode() is designed for exactly this purpose and should be preferred.
Maybe not integers but small unique keys, anyway shorter then guids:
http://www.codeproject.com/Articles/14403/Generating-Unique-Keys-in-Net
In a static class, keep a static const integer, then add 1 to it before every single access (using a public get property). This will ensure you cycle the whole int range before you get a non-unique value.
/// <summary>
/// The command id to use. This is a thread-safe id, that is unique over the lifetime of the process. It changes
/// at each access.
/// </summary>
internal static int NextCommandId
{
get
{
return _nextCommandId++;
}
}
private static int _nextCommandId = 0;
This will produce a unique integer value within a running process. Since you do not explicitly define how unique your integer should be, this will probably fit.
Here is the simplest solution, just call GetHashCode()
on the Guid
. Note, that a guid is a 128 bit int while a int is 32. So its not guaranteed to be unique. But its probably statistically good enough for most implementations.
public override bool Equals(object obj)
{
if (obj is IBase)
return ((IBase)obj).Id == this.Id;
return base.Equals(obj);
}
public override int GetHashCode()
{
if (this.Id == Guid.Empty)
return base.GetHashCode();
return this.Id.GetHashCode();
}