62

I have a class that contains the following two properties:

public int Id      { get; private set; }
public T[] Values  { get; private set; }

I have made it IEquatable<T> and overriden the object.Equals like this:

public override bool Equals(object obj)
{
    return Equals(obj as SimpleTableRow<T>);
}

public bool Equals(SimpleTableRow<T> other)
{
    // Check for null
    if(ReferenceEquals(other, null))
        return false;

    // Check for same reference
    if(ReferenceEquals(this, other))
        return true;

    // Check for same Id and same Values
    return Id == other.Id && Values.SequenceEqual(other.Values);
}

When having override object.Equals I must also override GetHashCode of course. But what code should I implement? How do I create a hashcode out of a generic array? And how do I combine it with the Id integer?

public override int GetHashCode()
{
    return // What?
}
AGB
  • 2,230
  • 1
  • 14
  • 21
Svish
  • 152,914
  • 173
  • 462
  • 620

9 Answers9

97

Because of the problems raised in this thread, I'm posting another reply showing what happens if you get it wrong... mainly, that you can't use the array's GetHashCode(); the correct behaviour is that no warnings are printed when you run it... switch the comments to fix it:

using System;
using System.Collections.Generic;
using System.Linq;
static class Program
{
    static void Main()
    {
        // first and second are logically equivalent
        SimpleTableRow<int> first = new SimpleTableRow<int>(1, 2, 3, 4, 5, 6),
            second = new SimpleTableRow<int>(1, 2, 3, 4, 5, 6);

        if (first.Equals(second) && first.GetHashCode() != second.GetHashCode())
        { // proven Equals, but GetHashCode() disagrees
            Console.WriteLine("We have a problem");
        }
        HashSet<SimpleTableRow<int>> set = new HashSet<SimpleTableRow<int>>();
        set.Add(first);
        set.Add(second);
        // which confuses anything that uses hash algorithms
        if (set.Count != 1) Console.WriteLine("Yup, very bad indeed");
    }
}
class SimpleTableRow<T> : IEquatable<SimpleTableRow<T>>
{

    public SimpleTableRow(int id, params T[] values) {
        this.Id = id;
        this.Values = values;
    }
    public int Id { get; private set; }
    public T[] Values { get; private set; }

    public override int GetHashCode() // wrong
    {
        return Id.GetHashCode() ^ Values.GetHashCode();
    }
    /*
    public override int GetHashCode() // right
    {
        int hash = Id;
        if (Values != null)
        {
            hash = (hash * 17) + Values.Length;
            foreach (T t in Values)
            {
                hash *= 17;
                if (t != null) hash = hash + t.GetHashCode();
            }
        }
        return hash;
    }
    */
    public override bool Equals(object obj)
    {
        return Equals(obj as SimpleTableRow<T>);
    }
    public bool Equals(SimpleTableRow<T> other)
    {
        // Check for null
        if (ReferenceEquals(other, null))
            return false;

        // Check for same reference
        if (ReferenceEquals(this, other))
            return true;

        // Check for same Id and same Values
        return Id == other.Id && Values.SequenceEqual(other.Values);
    }
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • 1
    Could you explain the reasoning behind the right version of GetHashCode()? – Vinko Vrsalovic May 31 '09 at 20:30
  • 5
    @Vinko: Can you clarify? Do you mean "why does the hash-code matter?" - or "why that approach?". Given your rep and answer count, I'm assuming the latter; this is simply a way of getting a hash that takes all the values into account the "multiply by a prime and add the next hash" is a very common approach for hashing that avoids collisions (contrast xor; in which case a collection of "all 8s" could easily give a predictable hashcode of 0). Did I miss anything? – Marc Gravell May 31 '09 at 20:57
  • See also: http://stackoverflow.com/questions/263400#263416... different prime number, but same effect. – Marc Gravell May 31 '09 at 21:01
  • Yes, that was the question. Thanks. – Vinko Vrsalovic May 31 '09 at 21:55
  • I started it anew, sorry. http://stackoverflow.com/questions/2626839/problem-with-custom-equality-and-gethashcode-in-a-mutable-object, anyway, my aim is the equality, I wish I could skip the GetHashCode implementation part. And yes, the initial value is 0. Anyway I use EF so all the objects are initialized with ID as 0 and then the properties are individually set one by one, not by the initializer, that's the reason that if it's hashed when the ID isn't loaded yet it goes nuts, maybe you'd know how to solve it and enjoy both proper hashing along with equality on this mutable object. – Shimmy Weitzhandler Apr 13 '10 at 04:00
  • Multiplying an int by a prime (that should also be close to a power of 2, e.g. 17 or 31 are good) spreads out the bits if the two input integers might be expected to be close in value (e.g. they are enum members, or IDs in a system with far fewer than 2 billion rows) – Eric J. Mar 11 '12 at 22:58
  • Can you explain with a bug/issue what exactly can be a problem for a class if `GetHashCode()` is not overridden in case of equals override? – Nilish May 22 '12 at 17:55
  • Note that your implementation of IEquatable> lacks the check for the same type. object of derived class of TableRow will be calculated equal. This would violate the rule that if A equals B that B should equal A: tableRow.Equals(derivedTableRow) might return true, but derivedTableRow.Equals(tableRow) will return false; Easy example: class Person with Name and Birthday and Derived class Child with property AttendingSchool. The Child might have the same values as the Person, so Person.Equals(Child), but Child.Equals(Person) returns false, because Person as Child returns null – Harald Coppoolse Aug 20 '15 at 09:24
  • Is the primary reason for not using the array's hash because it will be the hash of the array object only, without considering the values it stores or any other properties of SimpleTableRow? That would boil down to effectively returning a unique object reference for the array and making Equals() always return false, right? – Suncat2000 May 03 '23 at 12:42
40

FWIW, it's very dangerous to use the contents of the Values in your hash code. You should only do this if you can guarantee that it will never change. However, since it is exposed, I don't think guaranteeing it is possible. The hashcode of an object should never change. Otherwise, it loses its value as a key in a Hashtable or Dictionary. Consider the hard-to-find bug of using an object as a key in a Hashtable, its hashcode changes because of an outside influence and you can no longer find it in the Hashtable!

Chin
  • 19,717
  • 37
  • 107
  • 164
Dustin Campbell
  • 9,747
  • 2
  • 31
  • 32
  • 4
    This needs more upvote. I always made a wrong assumption between the concept of GetHashCode and a "MD5 hash" of a downloaded file. GetHashCode is not meant to compare the content, but the container. To make sure it points to the same place in the memory. I used GetHashCode to verify if an object changed since the last time it was saved to the database. I kept a cloned list just to compare the objects, but after overriding GetHashCode, everything based on hashtable started to behave weirdly. Now I just moved my override to it's own method and keep a dictionary with the "Content Hash" – Pluc Oct 29 '14 at 12:54
  • @Pluc: "GetHashCode is meant to make sure the container points to the same place in the memory.", well not quite. It *is* meant to compare content, it's just that it can have false positives due to collisions. Like MD5, but with a greater chance of collisions. – vgru Jun 29 '17 at 13:28
  • `its hashcode changes because of an outside influence and you can no longer find it in the Hashtable!` - make sense to me, if the object was changed, it's no longer the same object, hence it should not be in the hashtable, dictionary, hashset or whatever. – Mykhailo Seniutovych Mar 24 '19 at 11:10
  • Right, and because of this there is now a `System.HashCode` class available in the framework which allows you to combine hash codes in a safe way: Use the `.Add` method to add a hashcode (you got from `.GetHashCode()` of any variable) and use `.ToHashCode()` to calculate a new hash code based on the additions. Rewrite Marc's answer using those methods and you're again on a safe way. Thank you for mentioning your doubts! – Matt Jan 20 '22 at 07:40
4

Since the hashCode is kinda a key for storing the object (lllike in a hashtable), i would use just Id.GetHashCode()

Jhonny D. Cano -Leftware-
  • 17,663
  • 14
  • 81
  • 103
2

How about something like:

    public override int GetHashCode()
    {
        int hash = Id;
        if (Values != null)
        {
            hash = (hash * 17) + Values.Length;
            foreach (T t in Values)
            {
                hash *= 17;
                if (t != null) hash = hash + t.GetHashCode();
            }
        }
        return hash;
    }

This should be compatible with SequenceEqual, rather than doing a reference comparison on the array.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • It is dangerous to compare the contents of Values because they aren't guaranteed to be the same for the lifetime of the object. Because the array is exposed, any external class can change it, which affects the hashcode! – Dustin Campbell Mar 12 '09 at 14:23
  • The point, though, is that it is compatible with the Equals method as posted. – Marc Gravell Mar 12 '09 at 14:45
  • It affects the equality as well. And you can't use the reference to the arary to compute hashcode, because you end up with two equal objects with different hash codes. – Grzenio Mar 12 '09 at 14:48
  • @Grzenio - is that aimed at me or Dustin? I don't use the reference for exactly that reason... – Marc Gravell Mar 12 '09 at 14:51
  • Sorry for the confusion, it was a reply to Dustin's comment here and his code at the same time. – Grzenio Mar 12 '09 at 15:22
1

I just had to add another answer because one of the more obvious (and easiest to implement) solutions were not mentioned - not including the collection in your GetHashCode calculation!

The main thing that seemed to have forgotten here is that the uniqueness from the result of GetHashCode isn't required (or in many cases even possible). Unequal objects don't have to return unequal hash codes, the only requirement is that equal objects return equal hash codes. So by that definition, the following implementation of GetHashCode is correct for all objects (assuming there's a correct Equals implementation):

public override int GetHashCode() 
{ 
    return 42; 
} 

Of course this would yield the worst possible performance in hashtable lookup, O(n) instead of O(1), but it is still functionally correct.

With that in mind, my general recommendation when implementing GetHashCode for an object that happens to have any kind of collection as one or more of its members is to simply ignore them and calculate GetHashCode solely based on the other scalar members. This would work pretty well except if you put into a hash table a huge number of objects where all their scalar members have identical values, resulting in identical hash codes.

Ignoring collection members when calculating the hash code can also yield a performance improvement, despite the decreased distribution of the hash code values. Remember that using a hash code is supposed to improve performance in a hash table by not requiring to call Equals N times, and instead will only require calling GetHashCode once and a quick hash table lookup. If each object has an inner array with 10,000 items which all participate in the calculation of the hash code, any benefits gained by the good distribution would probably be lost. It would be better to have a marginally less distributed hash code if generating it is considerably less costly.

Allon Guralnek
  • 15,813
  • 6
  • 60
  • 93
  • The purpose of the hash code is not just to select a hash bucket, but more generally to quickly weed out things which can be recognized as non-equal. A class should only base its concept of equality on that of an encapsulated sequence if the sequence is immutable. Assuming the sequence is immutable, the class should probably include sequence items in its computed hash code (which it should in turn probably cache). Otherwise if one adds to a dictionary ten objects with 5,000-item arrays that differ in the last element, trying to find an element would result in... – supercat Sep 26 '12 at 16:52
  • ...all 5,000 elements of the new element being compared to all 5,000 elements of each of the ten objects. By contrast, if each item computed and cached a hash value for the array's contents, even if all ten hash values got mapped to the same hash bucket, the most that would happen if all hash values are distinct would be that the hash value of the new object would get compared to the cached hash values of the other ten. If a couple of hash values collide, that would still be no real problem--just one extra bunch of 5,000-element comparisons (rather than ten). – supercat Sep 26 '12 at 16:55
  • @supercat: You're doing a lot of assumptions here: That the sequence is immutable, that the object caches its own hashcode (I've never seen that), but most importantly that the object's only data upon which to base a hashcode is the sequence (notice that in the original question the object has an `Id` property which in almost all cases is enough to generate a unique hashcode). Anyway, you're talking about a very particular scenario which I fail to see how it relates to either the general case or the original question. – Allon Guralnek Sep 26 '12 at 19:17
  • If the sequence isn't immutable, it shouldn't take part in `equals`. My assumption that the type was immutable was predicated upon the OP wanting to test sequences for equality. If one is likely to have and compare against each other many object instances which would be identical (according to the definition used by `equals`) except for some trait, that trait should generally be part of the hash code. Java considers it worthwhile to cache the hash code for its most common "immutable sequence-ish" type (string). – supercat Sep 26 '12 at 19:35
  • I can't believe I'm reading this. The last GetHashCode() I wrote specifically had to enumerate a collection in the object to work, as did the Equals(). – Joshua Dec 18 '16 at 21:32
  • @Joshua: That doesn't make any sense. `GetHashCode()` never *has* to do anything to work. Any work you do is just to make it more evenly distributed. `Equals()`, on the other hand, must do *all* the work to function correctly. – Allon Guralnek Dec 19 '16 at 06:43
  • @AllonGuralnek: Have you seen what happens when you put objects into hash collections with nonfunctional GetHashCode()? GetHashCode() had to work in my case because the algorithms needed to be faster than N^2. – Joshua Dec 19 '16 at 16:32
1
public override int GetHashCode() {
   return Id.GetHashCode() ^ Values.GetHashCode();  
}

There are several good points in the comments and other answers. The OP should consider whether the Values would be used as part of the "key" if the object were used as a key in a dictionary. If so, then they should be part of the hash code, otherwise, not.

On the other hand, I'm not sure why the GetHashCode method should mirror SequenceEqual. It's meant to compute an index into a hash table, not to be the complete determinant of equality. If there are many hash table collisions using the algorithm above, and if they differ in the sequence of the Values, then an algorithm should be chosen that takes sequence into account. If sequence doesn't really matter, save the time and don't take it into account.

Hakan Fıstık
  • 16,800
  • 14
  • 110
  • 131
John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • I also don't think arrays have GetHashCode implemented taking into account all elements – Grzenio Mar 12 '09 at 14:17
  • That will do a reference comparison on Values, and will not be compatible with SequenceEqual (i.e. for different arrays with the same contents) – Marc Gravell Mar 12 '09 at 14:17
  • 1
    Guys, I've already said it before, but *be careful* using all of the elements of an exposed array. The result of GetHashCode() should be same for the lifetime of the object, otherwise it won't work as a hashtable key. There's no guarantee that this array won't change, so don't use it in GetHashCode! – Dustin Campbell Mar 12 '09 at 14:34
  • @Dustin: Good clarification. That's what I meant when I said "if the object is to be used as a key". Such objects may not change in a way that would change their hash code or equality, while they are acting as a key. – John Saunders Mar 12 '09 at 14:41
  • @John - such points are very valid, and well raised: however, posting a GetHashCode() implementation that is incompatible with the posted Equals() is *very wrong*, and could lead to a lot of problems - lost data, etc. – Marc Gravell Mar 12 '09 at 14:49
  • @Marc: can you post a URL that says the two implementations need to be equivalent (and that defines equivalent)? Although the purposes are similar, they are not identical. Surely Equals compares non-"key" fields. As long as two equal objects have equal hash code? Where's the problem? – John Saunders Mar 12 '09 at 14:53
  • 3
    http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx "If two objects compare as equal, the GetHashCode method for each object must return the same value." - where "compare as equal" means "Equals()" – Marc Gravell Mar 12 '09 at 15:01
  • http://stackoverflow.com/questions/371328/why-is-it-important-to-override-gethashcode-when-equals-method-is-overriden-in-c/371348#371348 – Marc Gravell Mar 12 '09 at 15:01
  • Note that SequenceEqual (in the posted Equals) will treat two different arrays with the same *contents* as equal; but they will have different hash codes, hence your code will not generate valid hash codes. – Marc Gravell Mar 12 '09 at 15:02
  • Or for a demonstration: http://stackoverflow.com/questions/638761/c-gethashcode-override-of-object-containing-generic-array/639098#639098 – Marc Gravell Mar 12 '09 at 15:16
  • If an immutable class holds arrays which will only be written during construction, and will after construction never be exposed to anything that would write them, it may be useful to have two class instances call themselves equal only if they hold arrays which are sequence-equal. In that scenario, the class' hash code should take into account the array contents, since it is the contents of the arrays which determine equality. – supercat Sep 26 '12 at 16:45
0

I know this thread is pretty old, but I wrote this method to allow me to calculate hashcodes of multiple objects. It's been very helpful for this very case. It's not perfect, but it does meet my needs and most likely yours too.

I can't really take any credit for it. I got the concept from some of the .net gethashcode implementations. I'm using 419 (afterall, it's my favorite large prime), but you can choose just about any reasonable prime (not too small . . . not too large).

So, here's how I get my hashcodes:

using System.Collections.Generic;
using System.Linq;

public static class HashCodeCalculator
{
    public static int CalculateHashCode(params object[] args)
    {
        return args.CalculateHashCode();
    }

    public static int CalculateHashCode(this IEnumerable<object> args)
    {
        if (args == null)
            return new object().GetHashCode();

        unchecked
        {
            return args.Aggregate(0, (current, next) => (current*419) ^ (next ?? new object()).GetHashCode());
        }
    }
}
D. Patrick
  • 2,894
  • 26
  • 37
0

I would do it this way:

long result = Id.GetHashCode();
foreach(T val in Values)
    result ^= val.GetHashCode();
return result;
Grzenio
  • 35,875
  • 47
  • 158
  • 240
  • 1
    pretty reasonable - note that xor can lead to a lot of collisions; *generally* a multiplier/addition is preferred – Marc Gravell Mar 12 '09 at 14:18
  • interesting, many people told me to use xor instead. I should read more about it then. – Grzenio Mar 12 '09 at 14:44
  • 2
    In response to that; what would the hash of {3,3,3,3} be? and {4,4,4,4}? or {4,0,0,4}? or {1,0,1,0}? You see the issue... – Marc Gravell Mar 12 '09 at 14:46
  • @MarcGravell: Multiplication is poor. Too bad C# doesn't have left or right bit roll. – Joshua Dec 19 '16 at 16:34
  • @Joshua if by "roll" you mean "circular shift", then it is easy to simulate with left and right shift. If you don't mean that, then please let me know - am genuinely curious. – Marc Gravell Dec 19 '16 at 18:25
  • @MarcGravell: Yeah that's what it is, and the generated code is annoyingly slow in comparison to the right CPU instruction. – Joshua Dec 19 '16 at 19:35
  • if by "circular shift you mean shifting the bits either left or right, to do that don't you just multiply or divide by 2? – David Klempfner Mar 28 '18 at 10:32
  • 1
    @Backwards_Dave That would be a regular shift. In a rotation or circular shift, the bits that are shifted out one side are simultaneously shifted back into the other side. If you divide 0xF9 by 2 four consecutive times, you are left with 0x0F. But if you rotate 0xF9 right by 4 positions (assuming 8-bit registers), you are left with 0x9F. – Carvo Loco May 12 '18 at 11:27
0

Provided that Id and Values will never change, and Values is not null...

public override int GetHashCode()
{
  return Id ^ Values.GetHashCode();
}

Note that your class is not immutable, since anyone can modify the contents of Values because it is an array. Given that, I wouldn't try to generate a hashcode using its contents.

Jason Plank
  • 2,336
  • 5
  • 31
  • 40
Dustin Campbell
  • 9,747
  • 2
  • 31
  • 32
  • That will do a reference comparison on Values, and will not be compatible with SequenceEqual (i.e. for different arrays with the same contents) – Marc Gravell Mar 12 '09 at 14:17
  • Right, but because the array is exposed and any external code can change it, it is frankly dangerous to compare the contents. – Dustin Campbell Mar 12 '09 at 14:21
  • So I should really just use the HashCode of the Id? – Svish Mar 12 '09 at 14:36
  • Which means that... IF the result of Equals changes, the result of GetHashCode wouldn't necessarily have to change, but if GetHashCode changes, then Equals would also change? – Svish Mar 12 '09 at 14:38
  • Not necessarily. The reference to Values shouldn't change (unless you change it in your code) -- so it should be OK to use that. John Saunders has the best answer here. – Dustin Campbell Mar 12 '09 at 14:42
  • @Dustin: "John Saunders has the best answer here" - no, posting a reply where the GetHashCode() is incompatible with Equals() is **not** good. It is actively a bad thing, and could lead to a *lot* of problems. – Marc Gravell Mar 12 '09 at 14:48