0

I am attempting to hash and keep(the hash) an object of type IEnumerable<anotherobject> which has about a 1000 entries. I'll be generating another such object, but this time I'd like to check for any changes in the values of the entries using the hash codes of the two objects.

Basically, I was wondering if GetHashCode() is apt for this, both from a performance perspective and reliability perspective.

If I have to override it, what would be a good way to do so, does it always depend on the type of anotherobject and what Equals means when comparing two anotherobjects? Is there a generic way to do it? This concern is because my object can be quite big.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
abhinav
  • 3,199
  • 2
  • 21
  • 25

2 Answers2

2

getting different values for different object values and same values for same object values, always

This is something no hashing function can give you. You are projecting a large (most likely effectively infinite) universe into four billion values. There are bound to be collisions.

Of course it depends on the type - if you have a type which has a limited number of values (like eg. points composed of two 16-bit coordinates), you might be able to have collision-less GetHashCode. But string, doubles or more complex types? No.

Standard (desireable) property of hashing functions is, that they can't give you false negative match, but they can give you false positive match (this is also rooted in .Net's documentation, so any implementation of GetHashCode is expected to behave like this).

So the standard workflow is:

  1. Compare hashes of the two objects. If false, the objects are not equal.
  2. Otherwise do full equality test.

See the documentation for GetHashCode.

EDIT:

Note that the default implementation pretty much returns some internal .Net instance ID, so it is absolutely unsuitable for pretty much anything. You should realize, that from System.Object's perspective two objects are the same only if they are the same instance.

Value-based equality is a semantic that has to be defined by the programmer.

The default implementation returns an index for the object determined by the common language runtime. The index is unique to an instance of an object within an AppDomain for an instance of the executing engine. However, because this index can be reused after the object is reclaimed during garbage collection, it is possible to obtain the same hash code for two different objects. Also, two objects that represent the same value have the same hash code only if they are the exact same object. This implementation is not particularly useful for hashing; therefore, derived classes should override GetHashCode.

See this.

Community
  • 1
  • 1
Matěj Zábský
  • 16,909
  • 15
  • 69
  • 114
  • Thanks, I'll edit the question. So, is the default(inherited) `GetHashCode()` good enough? – abhinav Nov 12 '11 at 09:50
  • +1, Thanks, your edit tells me `why` I should not use `GetHashCode()`, I can move on from this method instead of finding workarounds. – abhinav Nov 12 '11 at 10:05
  • @abhinav Well, you should be implementing GetHashCode anyways if you want to take advantage of data structures like Dictionary or HashTable. – Matěj Zábský Nov 12 '11 at 10:14
2

The return value of GetHashCode is guaranteed to be the same for the same object only on the same execution of the application; it's not guaranteed to be that reliable if you're storing hash codes between application executions. See the MSDN documentation for System.Object.GetHashCode() for more information ("a different hash code can be returned [by GetHashCode] if the application is run again."). In fact, as of March 2016, hash codes are now documented to possibly differ between different processes and different application domains (even within the same process), see the warning box in the GetHashCode documentation.

The return value of GetHashCode alone should never be used to determine an object's equality. Calling Equals will also be necessary.

For guidance on implementing GetHashCode, see the documentation's Notes to Inheritors.

On the default implementation of GetHashCode:

The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

(Note that this is different from, for example, Java's default implementation of hashCode(), which is documented to try to return different values for different objects "as much as is reasonably practical".)

If you need a more stable hash function, therefore, you must use your own, and more importantly, document your hash function to ensure its stability and ensure that users can rely on its stability.

There are several choices here, like MurmurHash3, MD5, and others. The important thing here is to document which hash function you're using.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • Thanks, so,what are my options? – abhinav Nov 12 '11 at 09:51
  • I mean, what do I do to generate a reliable hash code? After reading your edit, I see that I should not use `GetHashCode()` at all. Well, you answered the question, though. Thanks! In any case, if you have any advise about the generation of hash code, I'd be grateful. – abhinav Nov 12 '11 at 09:59
  • Thanks! That answers it completely. – abhinav Nov 12 '11 at 10:01