22

I have a class that internally is just an array of integers. Once constructed the array never changes. I'd like to pre-compute a good hashcode so that this class can be very efficiently used as a key in a Dictionary. The length of the array is less than about 30 items, and the integers are between -1000 and 1000 in general.

Konrad Garus
  • 53,145
  • 43
  • 157
  • 230
freddy smith
  • 3,347
  • 5
  • 24
  • 28
  • 1
    Dictionary key is unique and if your object store array of values, and key is computed based on them then there is no guarantee that you can get a unique hash key for the dictionary – Fadrian Sudaman Aug 04 '10 at 10:59
  • 1
    @Fadrian: The OP does not want to compute a key but a HashValue. Look up what that means. Hashvalues are pseudo-unique. – H H Aug 04 '10 at 11:57
  • Thanks Henk. I know how hash value are suppose to work and I may have misread the intend of the question when I posted the comment and its great that you pointed that out. – Fadrian Sudaman Aug 04 '10 at 12:04
  • Fadrian, Henk was right. My intent was not to get a unique code but to get something pretty close to that that is quickly computable so that I dont need to do a full Equals very often. I realise that if you know the data you expect fairly well that it is possible to make a more appropriate choice which is what my question is seeking. A lot of the answers below are quite mathematical and I will need time to understand them. – freddy smith Aug 04 '10 at 13:01

9 Answers9

32

Not very clever, but sufficient for most practical purposes:

EDIT: changed due to comment of Henk Holterman, thanks for that.

  int hc = array.Length;
  foreach (int val in array)
  {
      hc = unchecked(hc * 314159 + val);
  }

If you need something more sophisticated, look here.

Doc Brown
  • 19,739
  • 7
  • 52
  • 88
  • 13
    Looks OK, but 314159 might be a bit large. A number like 17 or 31 would do nicely too. And: `hc=unchecked(hc*SHIFTVAL+array[i]);` to be independent of compiler settings. – H H Aug 04 '10 at 11:54
  • 1
    Yes, one can improve that surely in many different ways, upvoted your comment. – Doc Brown Aug 04 '10 at 12:15
  • the finer points are indeed not to relevant but I would strongly advice the `unchecked()` operator. – H H Aug 04 '10 at 12:31
  • !! "260 6 3 3 1 0 0 0 0 0 0 0 0 1 0 2 6 0 1 0 -1 -1 -1 -1 -1" and "2123 23 9 56 0 1 4 3 1 0 0 0 0 1 10 0 4 0 1 1 -1 -1 -1 -1 -1" generate the same hash – chhenning Jun 05 '18 at 18:56
  • 4
    @chhenning: so what? – Doc Brown Jun 05 '18 at 19:35
  • 2
    @chhenning hash collisions may happen, no one claimed this would be "hash collision safe". This is where overriding `Equals` method take action. As far as I remember, the dicionary uses the `Equals` method to differ objects with the same hash (that's why the `Equals` method shouldn't realy on the hash itself). – Alisson Reinaldo Silva Jun 13 '18 at 12:26
  • 1
    @chhenning: or, if you prefer this: given two finite sets X and Y where X has a larger cardinality than Y, any function f:X->Y is necessarily non-injective. This is well know as the [pidgeon-hole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle), I guess you have heard of it. – Doc Brown Jun 13 '18 at 12:44
  • From docs "If neither checked nor unchecked is specified, the default context for non-constant expressions (expressions that are evaluated at run time) is defined by the value of the -checked compiler option. By default the value of that option is unset and arithmetic operations are executed in an unchecked context." – Steven T. Cramer Sep 03 '18 at 05:44
  • @StevenT.Cramer: and so you want to tell me what exactly? – Doc Brown Sep 03 '18 at 05:47
  • @DocBrown Just pointing out that `unchecked` is the default. But yours is more explicit. – Steven T. Cramer Sep 15 '18 at 05:02
  • Is there any reason to use `for` instead of `foreach`? – Damn Vegetables Apr 05 '21 at 04:12
  • @DamnVegetables: to be honset, I wrote this answer more than 10 years ago, I don't know why I used `for` instead of `foreach`. Does not make a huge difference. – Doc Brown Apr 05 '21 at 07:36
  • @HH I did try lower as you suggested and got collisions.... – Kram Aug 09 '23 at 00:04
7

For an array of values generally between -1000 and 1000, I would probably use something like this:

static int GetHashCode(int[] values)
{
   int result = 0;
   int shift = 0;
   for (int i = 0; i < values.Length; i++)
   {
      shift = (shift + 11) % 21;
      result ^= (values[i]+1024) << shift;
   }
   return result;
}
BlueMonkMN
  • 25,079
  • 9
  • 80
  • 146
  • 3
    FYI, I chose the number 11 because 11 bits is what is necessary to store a range of 2048 distinct values (-1000 to +1000 is 2000, which is close). I chose the number 21 because 32-bit integer minus 11 bits equals 21 bits. Shifting left 21 bits will leave 11 bits to contain a value from 0 to 2048. – BlueMonkMN Aug 04 '10 at 14:14
2

You may use CRC32 checksum. Here is the code:

[CLSCompliant(false)]
public class Crc32 {
    uint[] table = new uint[256];
    uint[] Table { get { return table; } }

    public Crc32() {
        MakeCrcTable();
    }
    void MakeCrcTable() {
        for (uint n = 0; n < 256; n++) {
            uint value = n;
            for (int i = 0; i < 8; i++) {
                if ((value & 1) != 0)
                    value = 0xedb88320 ^ (value >> 1);
                else
                    value = value >> 1;
            }
            Table[n] = value;
        }
    }
    public uint UpdateCrc(uint crc, byte[] buffer, int length) {
        uint result = crc;
        for (int n = 0; n < length; n++) {
            result = Table[(result ^ buffer[n]) & 0xff] ^ (result >> 8);
        }
        return result;
    }
    public uint Calculate(Stream stream) {
        long pos = stream.Position;
        const int size = 0x32000;
        byte[] buf = new byte[size];
        int bytes = 0;
        uint result = 0xffffffff;
        do {
            bytes = stream.Read(buf, 0, size);
            result = UpdateCrc(result, buf, bytes);
        }
        while (bytes == size);
        stream.Position = pos;
        return ~result;
    }
}
osprey
  • 29
  • 2
  • 5
    That seems overly complex for an array of ~30 integers from -1000 to 1000. It requires converting the array of integers into an array of bytes or a stream first because there's no function that accepts an array of integers as an input, right? – BlueMonkMN Aug 04 '10 at 12:21
  • It is easy to convert each int to byte[]: int value = 0; byte[] bytes = BitConverter.GetBytes(value); These bytes may be used to calculate checksum instead of bytes read from stream. – osprey Aug 04 '10 at 12:28
  • Yes, but you neglected the fact that you have to convert the whole array to bytes. That too is easy, but still ends up being some significant overhead in code complexity and at runtime relative to a solution specifically targeted at hashing an array of integers directly. – BlueMonkMN Aug 04 '10 at 14:08
2

You can use Linq methods too:

var array = new int[10];
var hashCode = array.Aggregate(0, (a, v) => 
    HashCode.Combine(a, v.GetHashCode()));
Amir Mahdi Nassiri
  • 1,190
  • 13
  • 21
1

I'm using this here

var arrayHash = string.Join(string.Empty, array).GetHashCode();

If a element changed in the array, you will get a new hash.

David Stania
  • 148
  • 8
0

You could take a different approach and use a recursive dictionary for each value in your int array. This way you can leave .net to do primitive type hashing.

internal class DictionaryEntry<TKey, TValue>
{
    public Dictionary<TKey, DictionaryEntry<TKey, TValue>> Children { get; private set; }
    public TValue Value { get; private set; }
    public bool HasValue { get; private set; }

    public void SetValue(TValue value)
    {
        Value = value;
        HasValue = true;
    }

    public DictionaryEntry()
    {
        Children = new Dictionary<TKey, DictionaryEntry<TKey, TValue>>();
    }
}

internal class KeyStackDictionary<TKey, TValue>
{
    // Helper dictionary to work with a stack of keys
    // Usage:
    // var dict = new KeyStackDictionary<int, string>();
    // int[] keyStack = new int[] {23, 43, 54};
    // dict.SetValue(keyStack, "foo");
    // string value;
    // if (dict.GetValue(keyStack, out value))
    // {   
    // }

    private DictionaryEntry<TKey, TValue> _dict;

    public KeyStackDictionary()
    {
        _dict = new DictionaryEntry<TKey, TValue>();
    }

    public void SetValue(TKey[] keyStack, TValue value)
    {
        DictionaryEntry<TKey, TValue> dict = _dict;

        for (int i = 0; i < keyStack.Length; i++)
        {
            TKey key = keyStack[i];
            if (dict.Children.ContainsKey(key))
            {
                dict = dict.Children[key];
            }
            else
            {
                var child = new DictionaryEntry<TKey, TValue>();
                dict.Children.Add(key, child);
                dict = child;
            }

            if (i == keyStack.Length - 1)
            {
                dict.SetValue(value);
            }
        }
    }

    // returns false if the value is not found using the key stack
    public bool GetValue(TKey[] keyStack, out TValue value)
    {
        DictionaryEntry<TKey, TValue> dict = _dict;

        for (int i = 0; i < keyStack.Length; i++)
        {
            TKey key = keyStack[i];

            if (dict.Children.ContainsKey(key))
            {
                dict = dict.Children[key];
            }
            else
            {
                break;
            }

            if (i == keyStack.Length - 1 && dict.HasValue)
            {
                value = dict.Value;
                return true;
            }
        }

        value = default(TValue);
        return false;
    }
}
David
  • 429
  • 3
  • 7
0

I think choosing a good hash-algorithm would have to be based on the distribution (in a probability sense) of the integer values.

Have a look at Wikipedia for a list of algorithms

S.C. Madsen
  • 5,100
  • 5
  • 32
  • 50
0

Any CRC (or even XOR) should be ok.

leppie
  • 115,091
  • 17
  • 196
  • 297
  • 3
    The XOR would never shift outside the -/+ 1000 window – H H Aug 04 '10 at 11:51
  • @Henk Holterman: Sorry, I dont get it. You will still have 10 bits of valid CRC if the values are limited. Edit: Actually the rest of the bits would flip depending on sign. – leppie Aug 04 '10 at 12:11
  • 4
    CRC is OK but overkill, simply XOR-ing the values (without shifting) is not OK. – H H Aug 04 '10 at 12:28
-2

I would recommend:

HashCode.Combine(array)

For .NET Core 2.1 / .NET Standard 2.1 / .NET 5 and later.

  • 1
    This will just be a hash code based on the address of the array in memory - not a hash code based on the contents of the array. – Emil Apr 04 '21 at 12:01
  • The edit queue for this answer is full but a correct solution based on `HashCode.Add()` is given in https://stackoverflow.com/questions/59375124/how-to-use-system-hashcode-combine-with-more-than-8-values. – Todd West Apr 29 '21 at 19:37