10

Hi I have a class with 6 string properties. A unique object will have different values for atleast one of these fields

To implement IEqualityComparer's GetHashCode function, I am concatenating all 6 properties and calling the GetHashCode on the resultant string.

I had the following doubts:

  1. Is it necessary to call the GetHashcode on a unique value?
  2. Will the concatenation operation on the six properties make the comparison slow?
  3. Should I use some other approach?
Oded
  • 489,969
  • 99
  • 883
  • 1,009
ganeshran
  • 3,512
  • 7
  • 41
  • 69
  • Are you planning comparing your objects somewhere such as sorting them in an array or some such? That will change wether or not you need to implement GetHashCode – N_A Sep 05 '11 at 14:11
  • hi mydogisbox, I am using it for the List.Contains method and passing the comparer object to it. I have already implemented Equals and dont know the right approach for GetHashcode – ganeshran Sep 05 '11 at 14:12

4 Answers4

4

If your string fields are named a-f and known not to be null, this is ReSharper's proposal for your GetHashCode()

public override int GetHashCode() {
  unchecked {
    int result=a.GetHashCode();
    result=(result*397)^b.GetHashCode();
    result=(result*397)^c.GetHashCode();
    result=(result*397)^d.GetHashCode();
    result=(result*397)^e.GetHashCode();
    result=(result*397)^f.GetHashCode();
    return result;
  }
}
Corey Kosak
  • 2,615
  • 17
  • 13
3

GetHashCode() should return the same hash code for all objects that return true if you call Equals() on those objects. This means, for example, that you can return zero as the hash code regardless of what the field values are. But that would make your object very inefficient when stored in data structures such as hash tables.

Combining the strings is one option, but note that you could for example combine just two of the stringsfor the hash code (while still comparing all the strings in equals!).

You can also combine the hashes of the six separate strings, rather than computing a single hash for a combined string. See for example Quick and Simple Hash Code Combinations

I'm not sure if this will be significantly faster than concatenating the string.

Community
  • 1
  • 1
Anders Forsgren
  • 10,827
  • 4
  • 40
  • 77
  • Thanks Anders, I am using it only for Contains method comparison. If I combine only two strings for the hashcode, wouldnt the hashcodes be same if the two values in objects are same? Would this mess up the comparisons, or does the GetHashCode have no affect on the comparison itself, and only impacts the performance – ganeshran Sep 05 '11 at 14:16
  • 1
    Others have already made this point, but I'd like to address it in a different way because it seems your intuition is stuck. Observe that GetHashCode() returns an int, which can take on only 2^32 different values. Your object, comprising 6 strings of arbitrary length, can obviously take on a far large number of values. Through this example we can easily see that GetHashCode() cannot possibly be a unique value for all possible values of your object. It must only satisfy this property: "if a.Equals(b) then a.GetHashCode()==b.GetHashCode()"; and notice that the "if" does NOT go both ways. – Corey Kosak Sep 05 '11 at 14:25
  • 1
    Practical considerations for GetHashCode() are that it be "good" and "fast". To make it "fast" I would try to avoid all memory allocation and string copying; to make it "good" is a subject of some nuance but in practice it suffices to swizzle together the GetHashCode() values of the sub-objects as @Jon suggests. I'll post ReSharper's suggestion as an "answer" so I can get code formatting. – Corey Kosak Sep 05 '11 at 14:29
  • Thanks Corey, I get it now. Sorry If my responses seemed noobish - as I didnt really get the actual way the GetHashCode works and is used in comparisons. I got it now. – ganeshran Sep 05 '11 at 14:31
3

GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object).

This means that:

  1. If two objects compare as equal with Equals, then their GetHashCode must return the same value.
  2. If some of the 6 string properties are not strictly read-only, they cannot take part in the GetHashCode implementation.

If you cannot satisfy both points at the same time, you should re-evaluate your design because anything else will leave the door open for bugs.

Finally, you could probably make GetHashCode faster by calling GetHashCode on each of the 6 strings and then integrating all 6 results in one value using some bitwise operations.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • Hi Jon, none of my properties are readonly. However, They all have private setters though so can be modified only from the constructor. Will this affect their usage in GetHashCode? – ganeshran Sep 05 '11 at 14:22
  • @ganeshran: Then they are effectively read-only over the lifetime of the object (i.e. they could be implemented with `readonly` backing fields if you wanted to), which is sufficient. You 'll be OK. – Jon Sep 05 '11 at 14:29
  • @Jon Reading your conditions for the `GetHashCode()` implementation, I see that it is a problem with conflicting requirements. In fact, you're only partly correct. The real requirements for `GetHashCode()` method are: 1. It should return same values for equal objects. 2. It should return the same value for the same object while it isn't modified. 3. The GetHashCode() should be fast. Agree, that there are some other recommendations for it. For example this one: For the best performance, a hash function should generate a random distribution for all input. – Igor Soloydenko Sep 05 '11 at 14:31
  • This sentence answered it for me "GetHashCode does not need to return unequal values for "unequal" objects. It only needs to return equal values for equal objects (it also must return the same value for the lifetime of the object)." I missed it initially. – ganeshran Sep 05 '11 at 14:33
  • 1
    @ganeshran: Note that it *is* desirable to return unequal values for unequal objects; it's just not *required* (and indeed, it is impossible to achieve mathematically). `return 42;` is a "correct" implementation in the technical sense, but it will make your performance suffer if you put these objects into dictionaries. – Jon Sep 05 '11 at 14:38
  • @Jon - Yes now I understand this after reading the responses. My initial misconception was that it was a mandatory requirement to return unequal values for unequal objects. Thanks! – ganeshran Sep 05 '11 at 14:42
  • @Jon: If the GetHashCode() would force us to use the read-only fields then it could be impossible for some classes. For example imagine a class Node which is used in the LinkedList class. The only fields it contains are: reference to the next node and the stored value. Both of these fields are mutable (non read-only). The Equals() method is defined on list class. It simply compares all stored items of two lists and returns true if each pair of items is the same. So, how can we write the GetHashCode() for the list if we don't have any immutable field? – Igor Soloydenko Sep 05 '11 at 14:43
  • @keykeeper: It would depend on how you define "equality" when we 're talking about lists. In this particular case I would simply suggest to *not* implement `IEquatable` for the list class; the most common notion of collection equality is already implemented in LINQ as`Enumerable.SequenceEquals`. – Jon Sep 05 '11 at 15:45
  • @Jon: Well, I didn't know about the `Enumerable.SequenceEquals` which can be applied in the described case perfectly. But there is still no way to implement the `GetHashCode` for my imaginary LinkedList. And to be honest, your idea about evasion of `IEquatable` interface implementation doesn't look great for me: The `Enumerable.SequenceEquals` isn't well known for my nearest colleagues in contrast to `Equals`. – Igor Soloydenko Sep 05 '11 at 17:24
0

You can use the behavior from:

http://moh-abed.com/2011/07/13/entities-and-value-objects/

Mohamed Abed
  • 5,025
  • 22
  • 31