0

I use an XOR based implementation in the GetHashCode implementation of most of my equatable types.

I've read several posts explaining why it is not the best solution so I decided to implement GetHashCode as suggested by Jon Skeet:

unchecked // Overflow is fine, just wrap
{
    int hash = 17;

    hash = hash * 23 + field1.GetHashCode();
    hash = hash * 23 + field2.GetHashCode();
    hash = hash * 23 + field3.GetHashCode();

    return hash;
}

Since the code is likely to be similar in most implementations, I tried to build a helper class to calculate hash codes for all my classes. It should be an easy thing to do but one of the main constraints with GetHashCode is it has to be fast. Therefore any implementation involving allocation is probably a no go (for instance, the use of a non static class).

Ideally a call to such a method would look like:

public override GetHashCode() => HashCodeCalculator.Calculate(X, Y, Z);

And have all the logic (unchecked + primes + null check...). But the use of a params parameter implicitly creates an array.

Is it best to duplicate the hashing algorithm in each class instead? Or is a class like the following as efficient?

public static class HashCalculator
{
    private const int _seed = 5923;
    private const int _multiplier = 7481;

    public static int Add(object value) => Add(_seed, value);

    public static int Add(int current, object value)
    {
        int valueHashCode = (value != null) ? value.GetHashCode() : 0;

        unchecked
        {
            return (current * _multiplier) + valueHashCode;
        }
    }
}

which can then be used like this:

public override int GetHashCode()
{
  int result = HashCalculator.Add(Prop1);
  result = HashCalculator.Add(result, Prop2);

  return result;
}
Community
  • 1
  • 1
vc 74
  • 37,131
  • 7
  • 73
  • 89

3 Answers3

3

You can create overloads for various small fixed numbers of parameters (2, 3, 4, etc. until you decide to stop), in order to avoid the array allocation, and then have a params overload that only ever needs to be used when there is a particularly large number of operands, at which point the overhead of the array allocation is less likely to be a problem (as it'll be a smaller percentage of the work done).

Servy
  • 202,030
  • 26
  • 332
  • 449
  • I thought about this but I have several cases where > 10 properties are involved. If an array is created for each GetHashCode invocation in Dictionary, the performance is going to be impacted. – vc 74 Oct 24 '16 at 14:48
  • @vc74 If you have enough usages where you actually have 10 properties, then create overloads going up to 10 (or more) operands, if you've done the benchmarking to determine that the array allocation is a problem. – Servy Oct 24 '16 at 14:49
  • Thinking about it, I guess it's a viable solution. I don't really like the idea of duplicating overloads but this could be a corner case where there is no other way to do it. params IEnumerable might be a solution in the future (if implemented someday) but it could well be they have the same performance issues as params. – vc 74 Oct 24 '16 at 14:55
  • 1
    @vc74 Even if that were added, you'd still need to create the enumerable object, which most likely couldn't be implemented to be any better than an array. – Servy Oct 24 '16 at 14:57
0

I can see why it is so tempting to have some kind of helper tool to calc hashes, but in this case efficiency is in great contradiction to convenience. You are trying to have a cookie and eat it and the answer depends on how much cookie you are willing to left over :)

  • One additional method call? Then it should have signature simmilar to int HashCode(params int subhashcodes) but invoking it will be ugly because you need to provide hashcodes of fields as parameters.
  • One method call and boxing? Then you can change int to object in previous signature to call fields hashcodes inside your method (I'm not fully sure that there will be no boxing in first case - feel free to correct me)

Personally I will stick to writing it by hand (or by Resharper).

Bart
  • 2,167
  • 2
  • 15
  • 21
  • it's more the impact on performance than the ugliness that worries me with params – vc 74 Oct 24 '16 at 15:09
  • Of course you can generate overloads for parameters from one to any particular N. But the more I think on this, the more I think that as Donald Knuth once said 'premature optimization is the root of all evil'. So use the most convenient method, and code it by hand only if you _know_ that this object will be heavily used. And if you miss one case, it's pretty simple to find it with the profiler. – Bart Oct 24 '16 at 15:17
0

After benchmarking it appears that using a struct like the following is almost as efficient as XORing and nicely encapsulates hash codes calculation.

/// <summary>
/// Calculates a hash code based on multiple hash codes.
/// </summary>
public struct HashCode
{
    private const int _seed = 5923;
    private const int _multiplier = 7481;

    /// <summary>
    /// Builds a new hash code.
    /// </summary>
    /// <returns>The built hash code.</returns>
    public static HashCode Build() => new HashCode(_seed);

    /// <summary>
    /// Constructor from a hash value.
    /// </summary>
    /// <param name="value">Hash value.</param>
    private HashCode(int value)
    {
        _value = value;
    }

    /// <summary>
    /// Builds a new hash code and initializes it from a hash code source.
    /// </summary>
    /// <param name="hashCodeSource">Item from which a hash code can be extracted (using GetHashCode).</param>
    public HashCode(object hashCodeSource)
    {
        int sourceHashCode = GetHashCode(hashCodeSource);
        _value = AddValue(_seed, sourceHashCode);
    }
    private readonly int _value;

    /// <summary>
    /// Returns the hash code for a given hash code source (0 if the source is null).
    /// </summary>
    /// <param name="hashCodeSource">Item from which a hash code can be extracted (using GetHashCode).</param>
    /// <returns>The hash code.</returns>
    private static int GetHashCode(object hashCodeSource) => (hashCodeSource != null) ? hashCodeSource.GetHashCode() : 0;

    /// <summary>
    /// Adds a new hash value to a hash code.
    /// </summary>
    /// <param name="currentValue">Current hash value.</param>
    /// <param name="valueToAdd">Value to add.</param>
    /// <returns>The new hash value.</returns>
    private static int AddValue(int currentValue, int valueToAdd)
    {
        unchecked
        {
            return (currentValue * _multiplier) + valueToAdd;
        }
    }

    /// <summary>
    /// Adds an object's hash code.
    /// </summary>
    /// <param name="hashCode">Hash code to which the object's hash code has to be added.</param>
    /// <param name="hashCodeSource">Item from which a hash code can be extracted (using GetHashCode).</param>
    /// <returns>The updated hash instance.</returns>
    public static HashCode operator +(HashCode hashCode, object hashCodeSource)
    {
        int sourceHashCode = GetHashCode(hashCodeSource);
        int newHashValue = AddValue(hashCode._value, sourceHashCode);

        return new HashCode(newHashValue);
    }

    /// <summary>
    /// Implicit cast operator to int.
    /// </summary>
    /// <param name="hashCode">Hash code to convert.</param>
    public static implicit operator int(HashCode hashCode) => hashCode._value;
}

which can be used like this:

public override int GetHashCode() => new HashCode(Prop1) + Prop2;

EDIT: .net core now has such a HashCode struct.

vc 74
  • 37,131
  • 7
  • 73
  • 89