33

One if the ways to implement GetHashCode - where it's required to do so - is outlined by Jon Skeet here. Repeating his code:

public override int GetHashCode()
{
    unchecked // Overflow is fine, just wrap
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

Rolling this code by hand can be error-prone and bugs can be subtle/hard to spot (did you swap + and * by mistake?), it can be hard to remember the combination rules for different types, and I don't like expending mental effort on writing/reviewing the same thing over and over again for different fields and classes. It can also obfuscate one of the most important details (did I remember to include all the fields?) in repetitive noise.

Is there a concise way to combine field hashcodes using the .net library?. Obviously I could write my own, but if there's something idiomatic/built-in I'd prefer that.

As an example, in Java (using JDK7) I can achieve the above using:

   @Override
   public int hashCode()  
   {  
      return Objects.hash(field1, field2, field3);  
   }  

This really helps to eliminate bugs and focus in the important details.

Motivation: I came across a C# class which requires an overridden GetHashCode(), but the way it combined the hashcodes of its various constituents had some severe bugs. A library function for combining the hashcodes would be useful for avoiding such bugs.

Community
  • 1
  • 1
bacar
  • 9,761
  • 11
  • 55
  • 75
  • As far as I know the closest you can get is to use ReSharper for equality members and hash code generation... – Patryk Ćwiek Aug 05 '13 at 18:37
  • All objects in .NET implement `GetHashCode()`, if you want to combine them just put whatever logic you want to do it with in a helper method. – evanmcdonnal Aug 05 '13 at 18:38
  • 2
    @evanmcdonnal, I don't want to write a helper method. I want someone else to write a standard helper method. Specifically I want someone to write a _correct_ helper method so as to minimize the chances of writing (or a maintainer changing it to) an incorrect implementation. It's very easy to incorrectly combine hashes in a way that leads to many collisions. – bacar Aug 05 '13 at 18:54
  • 3
    @HighCore, there are many reasons to override hashcode/GetHashCode. In particular, it's recommended if you override Equals and required if you want sensible behaviour if your object is going to be a key in a hashtable. http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx – bacar Aug 05 '13 at 19:01
  • @bacar [Hashtable was 8 years ago](http://stackoverflow.com/a/18059877/643085), and you need a good reason to override `Equals()`, too – Federico Berasategui Aug 05 '13 at 19:03
  • @HighCore OK, so is the fundamental premise of your position that there are few reasons for most developers to override `GetHashCode`, and therefore that a library function is of limited usefulness? That's good/useful enough advice to put in a full answer for a "this is how I do things in Java, how do I do it in C#" kind of question. – bacar Aug 05 '13 at 19:07
  • @bacar My point is that I don't see WHY you would override `Equals()` in such a daily basis so as to having to have a helper method or library for this. – Federico Berasategui Aug 05 '13 at 19:09
  • @HighCore, shrug. I override `equals()` and `hashcode()` in Java code regularly, because you need to if you want to use a class as a hashmap key without relying on object identity. Perhaps better support for value types in C# makes that mostly redundant (again, that would make for a great answer to the Q). I found a buggy implementation of `GetHashCode()` in some C# code and I thought the problem would have been avoided if they'd used a standard library function to combine the hashcodes. – bacar Aug 05 '13 at 19:15
  • I wrote a helper for this: http://blog.slaks.net/2010/12/simplifying-value-comparison-semantics.html – SLaks Aug 05 '13 at 20:02
  • @HighCore: My Noda Time library overrides Equals/GetHashCode in over 20 types (or provides an equality comparer which is pretty similar). I'm glad of `HashCodeHelper` when it comes to that... – Jon Skeet Aug 05 '13 at 20:08
  • @HighCore - sorry, I meant Dictionary (not Hashtable) anyway. The point still applies - it's an implementation of a hash table (which relies on GetHashCode) anyway, right? – bacar Aug 05 '13 at 20:08
  • @SLaks - it was a class intended to serve a similar purpose to your ValueComparer (and in fact implemented in a similar way) that had the buggy implementation :-) – bacar Aug 05 '13 at 21:48
  • @bacar Any chance you could nominate an answer for this question? Cheers. – Ryan Jan 12 '21 at 06:59

5 Answers5

24

EDIT: System.HashCode has now been released. The recommended way of creating hashcodes is now this:

public override int GetHashCode()
{
    return HashCode.Combine(fieldA, fieldB, fieldC);
}

System.HashCode.Combine() will internally call .GetHashCode() on each field, and do the right thing automatically.

For very many fields (more than 8), you can create an instance of HashCode and then use the .Add() method:

public override int GetHashCode()
{
    HashCode hash = new HashCode();
    hash.Add(fieldA);
    hash.Add(fieldB);
    hash.Add(fieldC);
    hash.Add(fieldD);
    hash.Add(fieldE);
    hash.Add(fieldF);
    hash.Add(fieldG);
    hash.Add(fieldH);
    hash.Add(fieldI);
    return hash.ToHashCode();
}

Visual Studio 2019 now has a Quick Actions helper to generate Equals() and GetHashCode() for you. Simply right click the class name in the declaration > Quick Actions and Refactorings > Generate Equals and GetHashCode. Select the members you want it to use for equality, and also "Implement IEquatable", and then click OK.

One last thing: If you need to get the structural hash code of an object, for example if you want to include the hashcode of an array that changes based on its contents (aka structure) and not its reference, then you will need to cast the field to IStructuralEquatable and get its hash code manually, like so:

public override int GetHashCode()
{
    return HashCode.Combine(
        fieldA,
        ((IStructuralEquatable)stringArrayFieldB).GetHashCode(EqualityComparer<string>.Default));
}

This is because the IStructuralEquatable interface is almost always implemented explicitly, so casting to IStructuralEquatable is required to call IStructuralEquatable.GetHashCode() instead of the default object.GetHashCode() method.

Finally, in the current implementation the .GetHashCode of an int is simply the integer value itself, so passing in the hashcode value to HashCode.Combine() instead of the field itself makes no difference to the result.

Old Answer:

For the sake of completeness, here is the hashing algorithm taken from the .NET Tuple Reference source, line 52. Interestingly, this hash algorithm was copied over from System.Web.Util.HashCodeCombiner.

Here is the code:

public override int GetHashCode() {
    // hashing method taken from .NET Tuple reference
    // expand this out to however many items you need to hash
    return CombineHashCodes(this.item1.GetHashCode(), this.item2.GetHashCode(), this.item3.GetHashCode());
}

internal static int CombineHashCodes(int h1, int h2) {
    // this is where the magic happens
    return (((h1 << 5) + h1) ^ h2);
}

internal static int CombineHashCodes(int h1, int h2, int h3) {
    return CombineHashCodes(CombineHashCodes(h1, h2), h3);
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4) {
    return CombineHashCodes(CombineHashCodes(h1, h2), CombineHashCodes(h3, h4));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), h5);
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6, int h7) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6, h7));
}

internal static int CombineHashCodes(int h1, int h2, int h3, int h4, int h5, int h6, int h7, int h8) {
    return CombineHashCodes(CombineHashCodes(h1, h2, h3, h4), CombineHashCodes(h5, h6, h7, h8));
}

Of course, the actual Tuple GetHashCode() (which is actually an Int32 IStructuralEquatable.GetHashCode(IEqualityComparer comparer)) has a large switch block to decide which one of these to call based upon how many items it is holding - your own code probably won't require that.

Ryan
  • 1,670
  • 18
  • 25
  • 1
    Please note that HashCodeCombiner starts with [a seed of 5381](http://referencesource.microsoft.com/#System.Web/Util/HashCodeCombiner.cs,29) – Gyum Fox Nov 14 '17 at 08:18
  • "It will also be used under the hood by System.Tuple and other immutable composite types." It's now in netcore 2.1. Note that the BCL (Tuple etc.) doesn't use it just yet, because I had massive problems getting it to work under netfx - that will probably only come with/after the next version of netfx. – Jonathan Dickinson Jun 04 '18 at 23:00
20

Some people use:

Tuple.Create(lastName, firstName, gender).GetHashCode()

It's mentioned on MSDN at Object.GetHashCode(), with the warning:

Note, though, that the performance overhead of instantiating a Tuple object may significantly impact the overall performance of an application that stores large numbers of objects in hash tables.

The logic of aggregating the constituent hashes is provided by System.Tuple, which hopefully has had some thought go into it...

Update: it is worth noting @Ryan's observation in the comments that this only appears to use the last 8 elements of any Tuple of Size>8.

bacar
  • 9,761
  • 11
  • 55
  • 75
  • 1
    Hmm, I'd wager this probably does pretty well. How big can n be in tuple I wonder? I doubt the 4 lines of code it takes to implement the Java style solution is a big deal, but I guess I can understand the desire for a standard well understood solution. –  Aug 05 '13 at 18:55
  • 1
    @ebyrob the largest in C# is an 8 tuple. – evanmcdonnal Aug 05 '13 at 18:58
  • does `Tuple.Create(first, second, third, fourth, fifth, sixth, seventh, Tuple.Create(eight, ninth, tenth, ...)).GetHashCode()` deal with that? – bacar Aug 05 '13 at 19:09
  • 1
    @bacar Yes, but it's not terribly efficient, and hash code generation *ought* to be an efficient operation. The method the OP is describing is also easy enough to implement properly. – Servy Aug 05 '13 at 19:10
  • 2
    @Servy You'd think so, huh? I did come across a buggy implementation, hence the motivation. There are loads of bugs you can accidentally put in here - swapping the addition/multiplication, poor choice of multiplier, forgetting the addition part entirely... I've seen them happen and the worst part is that they _kinda look_ similar to the 'standard' solution and end up getting past a code review too. I believe you should roll your own where it addresses a bottleneck, but minimize the cognitive burden for maintainers where it isn't. – bacar Aug 05 '13 at 19:19
  • @bacar So then take the extra 10-20 minutes to really look closely (and test) at those four lines of code instead of just the 30 seconds to glance and make sure it's mostly right. The basic algorithms are so widely used you have enough trustworthy sources to compare with. – Servy Aug 05 '13 at 19:23
  • @Servy - I'm very surprised by that. You appear to be arguing against libraries in general. Why would I want to take 10-20 minutes every time rolling my own instead of using a known correct library to do something? This feels like [NIH Syndrome](http://en.wikipedia.org/wiki/Not_invented_here)? – bacar Aug 05 '13 at 19:27
  • @bacar You don't need to do it every time. You need to do it once in your life. It would take you less time to do that then to post this question on SO. It's such a simple operation that it would appear people haven't *bothered* to write a library to do it. If you feel that there is a strong need out there for a library that does this, feel free to publish one yourself. – Servy Aug 05 '13 at 19:33
  • Just to add to this, the reference source implementation of the Tuple class uses [this as the hashing method](http://referencesource.microsoft.com/#mscorlib/system/tuple.cs), line 53. It basically goes like this: If the Tuple has more than 8 elements, only hash the last 8 elements. From the bottom up, `(((h1 << 5) + h1) ^ h2)` is called on each two elements to be hashed. Then, each two results of those hashes are hashed, until one hash is left at the end. This can be quite expensive for large Tuples. – Ryan May 25 '16 at 23:56
  • I don't see why that's particularly expensive. For hashing n elements you will always need n hash operations and (n-1) hash combining operations. Good spot that it only hashes the last 8 elements, though. – bacar May 26 '16 at 22:29
10

It's not exactly the same, but we have a HashCodeHelper class in Noda Time (which has lots of types which override equality and hash code operations).

It's used like this (taken from ZonedDateTime):

public override int GetHashCode()
{
    int hash = HashCodeHelper.Initialize();
    hash = HashCodeHelper.Hash(hash, LocalInstant);
    hash = HashCodeHelper.Hash(hash, Offset);
    hash = HashCodeHelper.Hash(hash, Zone);
    return hash;
}

Note that it's a generic method, which avoids boxing for value types. It copes with null values automatically (using 0 for the value). Note that the MakeHash method has an unchecked block as Noda Time uses checked arithmetic as a project setting, whereas hash code calculations should be allowed to overflow.

Simon P Stevens
  • 27,303
  • 5
  • 81
  • 107
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • +1. Though I still get the impression that you only need this in those special cases, not for regular daily `public class Person`, `public class BillingRepository`, etc. – Federico Berasategui Aug 05 '13 at 20:12
  • 7
    @HighCore: What may seem like a special case can be day-to-day work for other people. Not everyone write the same kind of code. – Jon Skeet Aug 05 '13 at 20:12
  • @automatonic: No more, I'm afraid... fixed now. – Jon Skeet Sep 25 '14 at 21:37
  • 1
    Nice one. I found this addition useful: ```internal static int HashAll(params object[] values) { int initialHash = Initialize(); return values.Aggregate(initialHash, Hash); }``` – angularsen Jan 21 '15 at 17:11
  • @anjdreas: Right, but that means a) creating an array each time; b) boxing value types. – Jon Skeet Jan 21 '15 at 17:12
  • Good point. Are both a) and b) about overhead and space, or do boxing also ruing the hash code generation? – angularsen Jan 21 '15 at 18:14
  • @anjdreas: Both about overhead, basically. Boxing shouldn't affect hash codes. – Jon Skeet Jan 21 '15 at 18:15
1

Here are a couple of concise (though not as efficient) refactors of the System.Web.Util.HashCodeCombiner mentioned in Ryan's answer

    public static int CombineHashCodes(params object[] objects)
    {
        // From System.Web.Util.HashCodeCombiner
        int combine(int h1, int h2) => (((h1 << 5) + h1) ^ h2);

        return objects.Select(it => it.GetHashCode()).Aggregate(5381,combine);
    }

    public static int CombineHashCodes(IEqualityComparer comparer, params object[] objects)
    {
        // From System.Web.Util.HashCodeCombiner
        int combine(int h1, int h2) => (((h1 << 5) + h1) ^ h2);

        return objects.Select(comparer.GetHashCode).Aggregate(5381, combine);
    }
jbtule
  • 31,383
  • 12
  • 95
  • 128
-9
public override GetHashCode()
{
    return this.Field1.GetHashCode() | this.Field2.GetHashCode | this.Field3.GetHashCode();
}
  • This generates a very poor hash with many collisions; `{Field1="foo",Field2="bar"}` generates the same hash as `{Field1="bar",Field2="foo"}`. Plus, using an or rather than an Xor could be seen as especially bad - the more fields you have the more likely that your hash is just equal to 0xFFFFFFFF - in fact if `Field1.GetHashCode()`= -1 = 0xFFFFFFFF, it doesn't matter what the other fields are, they will all have a hashcode of 0xFFFFFF. – bacar Feb 27 '14 at 18:09
  • And note that the int32's GetHashCode just returns the value itself. So if `Field1` is -1, all your other fields would be redundant by this implementation. – bacar Feb 27 '14 at 18:25
  • Even `return (this.Field1.GetHashCode().ToString() + this.Field2.GetHashCode().ToString()).GetHashCode();` would make a better implementation. ;-) – Mitja Sep 05 '14 at 10:04