1

I have an array of int, I want to create a hash function for it, so that two integer arrays with different elements results in the same hash values for low possibility, what is the best way to do that?

The length of array could be up to 500, the integer number could be from 0 to 50.

Note that there is not exact duplicate of the question, as the nature of integer array (length and range of number) is different.

I use this before

public int GetHashCode(int[] data)

{
    if (data == null)
    return 0;

int result = 17;



foreach (var value in data)
{
    result += result * 23 + value;
}


return result;
}

but I discover it has many collision.

What I want to solve is to construct a dictionary<int[], string> so that when integer of the same values should results in different Hashcode.

william007
  • 17,375
  • 25
  • 118
  • 194
  • what do you mean by "BEST"? no collision at all? small ram usage? LOW cpu usage? for low cpu/ram usage sum all array elements in a long long and hope for the best. remember a hash function will ALWAYS have some collision, so if it say array are equals, you will have to check by yourself – Lesto Feb 26 '14 at 11:16
  • @lesto I know there will be collision, what i mean is reduce to the lowest – william007 Feb 26 '14 at 11:19
  • How do different length and number range make it different? – Euphoric Feb 26 '14 at 11:19
  • @william007 `so that two integer arrays with different elements do not result in the same hash values` - impossible, sum[n=0..50](500^n) is way over Int32.MaxValue – decPL Feb 26 '14 at 11:20
  • @Euphoric as one can make a hashcode that could optimize for that – william007 Feb 26 '14 at 11:21
  • @decPL question edited – william007 Feb 26 '14 at 11:21
  • @william007 Did you profile something to know you need to optimize? – Euphoric Feb 26 '14 at 11:22
  • @william007 increasing the hash variable size (proposed solution are int based, i've said to use long long), then test some hash function to find the best for your needing. see http://www.partow.net/programming/hashfunctions/index.html – Lesto Feb 26 '14 at 11:23
  • @lesto but GetHashCode in C# returning int http://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.110).aspx – william007 Feb 26 '14 at 11:25
  • @william007 Could you share some details on your data? What's the volume, how many collisions using your approach, can you estimate/describe the distribution? – decPL Feb 26 '14 at 11:25
  • 1
    @william007: Why exactly is the linked question not a duplicate? Do you actually care about the chance of collisions? How many of these arrays are you going to hash? Have you thought about using a keyed container instead? What's the problem you are trying to solve? There are too many unanswered questions. You ask for a hash but do not demonstrate adequate knowledge on the subject, so I 'm not at all sure a hash is the correct solution in the first place. – Jon Feb 26 '14 at 11:25
  • @Jon I have added my purpose at the last sentence – william007 Feb 26 '14 at 11:28
  • This is **not** a duplicate because this question asks about a perfect hash for the array. Voting to re-open. – Sergey Kalinichenko Feb 27 '14 at 01:51
  • I do not see where the questioner is asking for a perfect hash, or even how a perfect hash is possible given the number of degrees of freedom of the question. – Dour High Arch Feb 27 '14 at 02:16

2 Answers2

3

two integer arrays with different elements do not result in the same hash values

This is not possible for arrays with more than one element. An array with N elements has 32*N bits of information, you cannot map it to the 32 bits of the hash code without losing some information, unless N=1.

For N>1 there will be a very large number of array pairs for which the hash code is the same, while the arrays are different. There are techniques that make it less likely that a pair of arrays chosen at random would have the same hash code, but it is not possible to eliminate collisions completely for the general case.

The length of array could be up to 500, the integer number could be from 0 to 50

You need approximately 2500 bits to represent an array like that; your hash value has only 32 bits, so you will have lots of hash collisions as well. You can do a perfect hash for arrays of zero to five elements with values 0..50 by packing the numbers in an int (use value 51 to represent "a missing value" so that you could pack arrays of different length). Once you need to add the sixths number to the mix, your hash would not longer be perfect.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Thanks, I knew this information, just that I am not sure how to construct a better hashcode. – william007 Feb 26 '14 at 11:29
  • @william007 Although you cannot make a perfect hash, it does not mean you cannot make a pretty good one. For example, the answer to the alleged duplicate (which isn't a duplicate, really) gives a reasonably good and fast way of hashing an array. – Sergey Kalinichenko Feb 26 '14 at 11:37
0

500 values form 0 to 50 means you can store the sum of all values multiplied by 50 and by position (starting from 0) also this can be reversed to extrapolate values

just check for size lenght and this has, and you should never find a collision

Lesto
  • 2,260
  • 2
  • 19
  • 26
  • Values multiplied by position is far from collision-free: `[0, 2, 1]` and `[0, 0, 2]` give the same hash. Values multiplied by 50 to the Nth power is collision-free, but 50^500 is... larger than a `long`. – Jon Feb 26 '14 at 11:33
  • If you use a plain sum, all arrays with the same numbers in different order will produce the same hash code. – Sergey Kalinichenko Feb 26 '14 at 11:33
  • You are right, i've missed a piece of algorithm, but is still possible. Ill update my answer – Lesto Feb 26 '14 at 11:38
  • fixed my response, but now i'm unsure this can be stored in a long or long long, have to do the math – Lesto Feb 26 '14 at 12:25