10

I used to use the apache hashcode builder a lot

Does this exist for C#

Jack Kada
  • 24,474
  • 29
  • 82
  • 106

5 Answers5

8

This is my homemade builder.

Usage:

hash = new HashCodeBuilder().
             Add(a).
             Add(b).
             Add(c).
             Add(d).
             GetHashCode();

It does not matter what type fields a,b,c and d are, easy to extend, no need to create array.

Source:

public sealed class HashCodeBuilder
{
    private int hash = 17;

    public HashCodeBuilder Add(int value)
    {
        unchecked
        {
            hash = hash * 31 + value; //see Effective Java for reasoning
             // can be any prime but hash * 31 can be opimised by VM to hash << 5 - hash
        }
        return this;
    }

    public HashCodeBuilder Add(object value)
    {
        return Add(value != null ? value.GetHashCode() : 0);
    }

    public HashCodeBuilder Add(float value)
    {
        return Add(value.GetHashCode());
    }

    public HashCodeBuilder Add(double value)
    {
        return Add(value.GetHashCode());
    }

    public override int GetHashCode()
    {
        return hash;
    }
}

Sample usage:

public sealed class Point
{
    private readonly int _x;
    private readonly int _y;
    private readonly int _hash;

    public Point(int x, int y)
    {
        _x = x;
        _y = y;
        _hash = new HashCodeBuilder().
            Add(_x).
            Add(_y).
            GetHashCode();
    }

    public int X
    {
        get { return _x; }
    }

    public int Y
    {
        get { return _y; }
    }

    public override bool Equals(object obj)
    {
        return Equals(obj as Point);
    }

    public bool Equals(Point other)
    {
        if (other == null) return false;
        return (other._x == _x) && (other._y == _y);
    }

    public override int GetHashCode()
    {
        return _hash;
    }
}
weston
  • 54,145
  • 21
  • 145
  • 203
  • I like the concept of encapsulating such reusable code. But isn't there a performance penalty due to the method calls? – Steve B Nov 16 '12 at 15:28
  • 2
    @SteveB Depends how you use it. Given, you should ideally only use immutable data in hash codes, if you do this once in the constructor, then store the result in private member `hash`, it's more effecient than doing the normal calculate each time `GetHashCode` is called. – weston Nov 16 '12 at 15:31
  • @SteveB Added example usage to show what I mean. Also has correct equals implementation. – weston Nov 16 '12 at 15:41
4

I use the following:

public static int ComputeHashFrom(params object[] obj) {
    ulong res = 0;
    for(uint i=0;i<obj.Length;i++) {
        object val = obj[i];
        res += val == null ? i : (ulong)val.GetHashCode() * (1 + 2 * i);
    }
    return (int)(uint)(res ^ (res >> 32));
}

Using such a helper is quick, easy and reliable, but it has potential two downsides (which you aren't likely to encounter frequently, but are good to be aware of):

  • It can generate poor hashcodes for some distributions of params. For instance, for any int x, ComputeHashFrom(x*-3, x) == 0 - so if your objects have certain pathological properties you may get many hash code collisions resulting in poorly performing Dictionaries and HashSets. It's not likely to happen, but a type-aware hash code computation can avoid such problems more easily.
  • The computation of the hashcode is slower than a specialized computation could be. In particular, it involved the allocation of the params array and a loop - which quite a bit of unnecessary overhead if you've just got two members to process.

Neither of the drawbacks causes any errors merely inefficiency; and both with show up in a profiler as blips in either this method or in the internals of the hash-code consumer.

Eamon Nerbonne
  • 47,023
  • 20
  • 101
  • 166
  • To your caveat about speed, I'd add that you can also produce better hash codes with a type-aware method, especially if you have a reasonable idea of what values are going to happen most often. I'd lean toward caring more about that then the computation speed. – Jon Hanna Oct 14 '10 at 09:28
  • The point of such a simple & safe method is to be simple & safe. For some distributions of objects, this will result in poorer hashcodes, and it will always take a bit more time to compute. However, in the general case, these hashcodes work fine (after all, the constituent obj members do generally have type-aware GetHashCode implementations), and most programs do not spend most of their time making or using hashcodes. From practical experience, however, I'll note that I have run into GetHashCode perf issues but have not run into quality issues with *this* implementation - YMMV. – Eamon Nerbonne Oct 14 '10 at 09:52
  • In practice, since object equality comparison is often much cheaper than `.GetHashCode`, you end up paying only little if you encounter a *few* more collisions. On the other hand, if you're doing lots of set/dictionary computations, you can simply cache the hashcode of unchanged objects; but you cannot avoid the consequences of a bad hash-code. Anyhow, in practice I wouldn't bother with anything more complicated until profiling reveals it's worth it - which it almost never does. – Eamon Nerbonne Oct 14 '10 at 10:05
  • Yep, I agree. As said, I'd add the matter to your caveat about speed, but that doesn't mean I don't think it's a reasonable general method. – Jon Hanna Oct 14 '10 at 10:46
4

C# doesn't have a built-in HashCode builder, but you can roll your own. I recently had this precise problem and created this hashcode generator that doesn't use boxing, by using generics, and implements a modified FNV algorithm for generating the specific hash. But you could use any algorithm you'd like, like one of those in System.Security.Cryptography.

    public static int GetHashCode<T>(params T[] args)
    {
        return args.GetArrayHashCode();
    }

    public static int GetArrayHashCode<T>(this T[] objects)
    {
        int[] data = new int[objects.Length];

        for (int i = 0; i < objects.Length; i++)
        {
            T obj = objects[i];
            data[i] = obj == null ? 1 : obj.GetHashCode();
        }

        return GetFnvHash(data);
    }

    private static int GetFnvHash(int[] data)
    {
        unchecked
        {
            const int p = 16777619;
            long hash = 2166136261;

            for (int i = 0; i < data.Length; i++)
            {
                hash = (hash ^ data[i]) * p;
            }

            hash += hash << 13;
            hash ^= hash >> 7;
            hash += hash << 3;
            hash ^= hash >> 17;
            hash += hash << 5;

            return (int)hash;
        }
    }
Ian Kemp
  • 28,293
  • 19
  • 112
  • 138
Marcel Valdez Orozco
  • 2,985
  • 1
  • 25
  • 24
3

Microsoft recently released a class to compute hashcodes. Please see https://learn.microsoft.com/en-us/dotnet/api/system.hashcode. You need to include NuGet package Microsoft.Bcl.HashCode in your project to use it.

Usage example:

using System.Collections.Generic;

public class MyClass {
    public int MyVar { get; }
    public string AnotherVar { get; }
    public object MoreVars;
 
    public override int GetHashCode() 
        => HashCode.Combine(MyVar, AnotherVar, MoreVars);
}
Jordi
  • 2,055
  • 1
  • 16
  • 34
  • The example that you give is not very good, because what you are showing is not a hashcode builder. However, the `HashCode` class ***can*** be instantiated, and its instances ***do*** offer an `Add()` method, (like @weston's home-brew,) so it ***can*** be used like a hashcode builder, as the original question asks. – Mike Nakis Aug 04 '22 at 17:32
1

Nowadays I leverage ValueTuples, ref Tuples or anonymous types:

var hash = (1, "seven").GetHashCode();
var hash2 = Tuple.Create(1, "seven").GetHashCode();
var hash3 = new { Number = 1, String = "seven" }.GetHashCode();

I believe value tuples will be fastest.

Gusdor
  • 14,001
  • 2
  • 52
  • 64