26

The MSDN documentation on Object.GetHashCode() describes 3 contradicting rules for how the method should work.

  1. If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
  2. For the best performance, a hash function must generate a random distribution for all input.
  3. The hash function must return exactly the same value regardless of any changes that are made to the object.

Rules 1 & 3 are contradictory to me.

Does Object.GetHashCode() return a unique number based on the value of an object, or the reference to the object. If I override the method I can choose what to use, but I'd like to know what is used internally if anyone knows.

Dan Herbert
  • 99,428
  • 48
  • 189
  • 219

6 Answers6

30

Rules 1 & 3 are contradictory to me.

To a certain extent, they are. The reason is simple: if an object is stored in a hash table and, by changing its value, you change its hash then the hash table has lost the value and you can't find it again by querying the hash table. It is important that while objects are stored in a hash table, they retain their hash value.

To realize this it is often simplest to make hashable objects immutable, thus evading the whole problem. It is however sufficient to make only those fields immutable that determine the hash value.

Consider the following example:

struct Person {
    public readonly string FirstName;
    public readonly string Name;
    public readonly DateTime Birthday;

    public int ShoeSize;
}

People rarely change their birthday and most people never change their name (except when marrying). However, their shoe size may grow arbitrarily, or even shrink. It is therefore reasonable to identify people using their birthday and name but not their shoe size. The hash value should reflect this:

public int GetHashCode() {
    return FirstName.GetHashCode() ^ Name.GetHashCode() ^ Birthday.GetHashCode();
}
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Since all objects are hashable in C# (GetHashCode() is part of the very basic Object type), you would suggest to make all objects immutable - not very practical, isn't it? – thewhiteambit Jan 08 '16 at 13:10
  • 1
    @thewhiteambit No. I’m suggesting that not all objects are good candidates for hash table keys. Just because they *can* be hashed doesn’t mean that they *should* be. And the fact that `GetHashCode` is a method of the `Object` base class is simply a bad design decision in the C# language. Furthermore, my answer isn’t saying that you must make every hash table key type immutable — merely that doing so helps tremendously. – Konrad Rudolph Jan 09 '16 at 01:42
  • You are right, I was just pointing to the sentence "To realize this it is often simplest to make hashable objects immutable" - and since all objects are hashable (by bad design choice) this very attempt would make all hashable objects (equals all objects) immutable. But I guess you did not mean it that way. You probably meant just to make all objects one wants to store in Hash-Collections immutable. – thewhiteambit Jan 12 '16 at 13:52
  • 1
    @thewhiteambit Ah, yes. As in: if you have control over the object type you’re going to hash, (strongly) consider making it immutable. – Konrad Rudolph Jan 12 '16 at 13:54
  • Why do you use the `^`? – Farhad Zamani May 21 '22 at 08:03
  • 1
    @FarhadZamani Xor is a simple method of mixing hashes. It’s not *very* good — ideally you’d something else, e.g. [`HashCode.Combine`](https://learn.microsoft.com/en-us/dotnet/api/system.hashcode.combine) but (IIRC) that wasn’t available yet when I wrote this answer. – Konrad Rudolph May 21 '22 at 12:16
5

Not sure what MSDN documentation you are referring to. Looking at the current documentation on Object.GetHashCode (http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx) provides the following "rules":

  • If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

  • The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

  • For the best performance, a hash function must generate a random distribution for all input.

If you are referring to the second bullet point, the key phrases here are "as long as there is no modification to the object state" and "true only for the current execution of an application".

Also from the documentation,

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. Hash functions are usually specific to each Type and must use at least one of the instance fields as input. [Emphasis added is mine.]

As for the actual implementation, it clearly states that derived classes can defer to the Object.GetHashCode implementation if and only if that derived class defines value equality to be reference equality and the type is not a value type. In other words, the default implementation of Object.GetHashCode is going to be based on reference equality since there are no real instance fields to use and, therefore, does not guarantee unique return values for different objects. Otherwise, your implementation should be specific to your type and should use at least one of your instance fields. As an example, the implementation of String.GetHashCode returns identical hash codes for identical string values, so two String objects return the same hash code if they represent the same string value, and uses all the characters in the string to generate that hash value.

Scott Dorman
  • 42,236
  • 12
  • 79
  • 110
  • That was the most verbose and confusing answer I've ever read. It left me even more confused than what I began with. – bleepzter Jan 13 '15 at 01:49
4

Rules 1 & 3 aren't really a contradiction.

For a reference type the hash code is derived from a reference to the object - change an object's property and the reference is the same.

For value types the hash code is derived from the value, change a property of a value type and you get a completely new instance of the value type.

Keith
  • 150,284
  • 78
  • 298
  • 434
1

A very good explanation on how to handle GetHashCode (beyond Microsoft rules) is given in Eric Lipperts (co. Designer of C#) Blog with the article "Guidelines and rules for GetHashCode". It is not good practice to add hyperlinks here (since they can get invalid) but this one is worth it, and provided the information above one will probably still find it in case the hyperlink is lost.

thewhiteambit
  • 1,365
  • 16
  • 31
0

I can't know for sure how Object.GetHashCode is implemented in real .NET Framework, but in Rotor it uses SyncBlock index for the object as hashcode. There are some blog posts about it on the web, however most of them are from 2005.

Ilya Ryzhenkov
  • 11,782
  • 1
  • 40
  • 50
0

By default it does it based on the reference to the object, but that means that it's the exact same object, so both would return the same hash. But a hash should be based on the value, like in the case of the string class. "a" and "b" would have a different hash, but "a" and "a" would return the same hash.

Darren Kopp
  • 76,581
  • 9
  • 79
  • 93