I am using a hash set in which I store array of integers(32 bits). This means I need an algorithm to hash an array of integers. I am looking for a 32 bits integer(C# int) hash.
I have tried and edited two existing algorithms, which you can see the four versions of at the bottom, including their benchmark.
My questions are as follows:
1. Do you think the bottom algorithm is good for this purpose?
2. Is there a better algorithm available for this purpose?
Program information
- Typically an array has
16 entries
, and the integers aresmaller than 10
, although both must support larger values. I could say the largest values that have any chance on occurring are 200 entries and integers of value 20. - I am using the HashSet in a breath-first search algorithm, to compare if two nodes are the same. http://en.wikipedia.org/wiki/Breadth-first_search.
- For this specific program I am unable to use unsafe code.
Benchmarks and code
below are my benchmarks and code, from worst to best performance in my program.
- Coordinates2D is a struct containing an int x and an int y.
- The total entries in the HashSet at run end is
356525
- I could not retrieve the number of collisions exactly. The number given is the number of times an object was actually compared and not equal(same hash, different object). This happens multiple times between the same objects though. This value varies a bit per execution, as the program is multithreaded.
- MurMurHash3 seed is
const uint seed = 144
MurMurHash3 using bytes retrieved from the coordinates directly
Code is equal to https://gist.github.com/automatonic/3725443 The array of bytes is retrieved using the following code:
int size = Marshal.SizeOf(typeof(Coordinates2D));
int length = carCoords.Length;
Byte[] bytes = new Byte[size * length];
for (int i = 0; i < length; ++i)
{
GCHandle pinStructure = GCHandle.Alloc(carCoords[i], GCHandleType.Pinned);
Marshal.Copy(pinStructure.AddrOfPinnedObject(), bytes, i*size, size);
pinStructure.Free();
}
// Hash the byte array
return MurMurHash3.Hash(new System.IO.MemoryStream(bytes));
This is incredibly inefficient, because of the copying.
- Performance: 40880ms
- Collisions: < 84
MurMurHash3 using bytes retrieved from the integers in the objects
public static int Hash2(RushHourPathLengthNode.Coordinates2D[] coords)
{
const uint c1 = 0xcc9e2d51;
const uint c2 = 0x1b873593;
uint h1 = seed;
uint k1 = 0;
uint streamLength = (uint)coords.Length * 2;
for (int i = 0, l = coords.Length; i < l; ++i)
{
// Do it for X
byte[] chunk = BitConverter.GetBytes(coords[i].x);
/* Get four bytes from the input into an uint */
k1 = (uint)
(chunk[0]
| chunk[1] << 8
| chunk[2] << 16
| chunk[3] << 24);
/* bitmagic hash */
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
// Do it for y
chunk = BitConverter.GetBytes(coords[i].y);
/* Get four bytes from the input into an uint */
k1 = (uint)
(chunk[0]
| chunk[1] << 8
| chunk[2] << 16
| chunk[3] << 24);
/* bitmagic hash */
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
}
// finalization, magic chants to wrap it all up
h1 ^= streamLength;
h1 = fmix(h1);
unchecked //ignore overflow
{
return (int)h1;
}
}
This is alot more efficient now the copying is gone.
- Performance: 16640ms
- Collisions: < 92
MurMurHash3 using integers
public static int Hash(RushHourPathLengthNode.Coordinates2D[] coords)
{
const uint c1 = 0xcc9e2d51;
const uint c2 = 0x1b873593;
uint h1 = seed;
uint k1 = 0;
uint streamLength = (uint)coords.Length * 2;
for (int i = 0, l = coords.Length; i < l; ++i)
{
k1 = (uint)coords[i].x;
//bitmagic hash
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
k1 = (uint)coords[i].y;
//bitmagic hash
k1 *= c1;
k1 = rotl32(k1, 15);
k1 *= c2;
h1 ^= k1;
h1 = rotl32(h1, 13);
h1 = h1 * 5 + 0xe6546b64;
}
// finalization, magic chants to wrap it all up
h1 ^= streamLength;
h1 = fmix(h1);
unchecked //ignore overflow
{
return (int)h1;
}
}
- Performance: 13027ms
- Collisions: < 95
Hash using integer addition multiplication
int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
hash = hash * 31 + carCoords[i].x;
hash = hash * 31 + carCoords[i].y;
}
return hash;
- Performance: 4564ms
- Collisions: < 44
As you see, this one is far more efficient. It works well with any prime numbers. As I understand, there is no scientific proof of this to work, which I am not too fond of.
According to Michal B. a faster version would be using bitshifting. However, testing shows that this is not a successful hash. The problem takes significantly longer to run(It did not finish within 5 minutes). The bitshifting might be good, but it seems like the 31(prime number) is crucial.
int hash = 17;
for (int i = 0, l = carCoords.Length; i < l; ++i)
{
hash = hash << 5 - carCoords[i].x;
hash = hash << 5 - carCoords[i].y;
}
return hash;