5

I never need to store objects in a hash table. The reason is twofold:

  • coming up with a good hash function is difficult and error prone.
  • an AVL tree is almost always fast enough, and it merely requires a strict order predicate, which is much easier to implement.

The Equals() operation on the other hand is a very frequently used function.

Therefore I wonder whether it is necessary to implement GetHashCode (which I never need) when implementing the Equals function (which I often need)?

Dimitri C.
  • 21,861
  • 21
  • 85
  • 101
  • Take a look into Essential C# 4.0 (or older if you already have) into chapter 9 *Well-Formed Types* and you'll know when to override it. – Oliver May 28 '10 at 13:16

7 Answers7

12

my advise - if you don't want to use it, override it and throw new NotImplementedException(); so that you will see where did you need it.

Andrey
  • 59,039
  • 12
  • 119
  • 163
  • 1
    That's a very good idea. I wonder why the default implementation doesn't do just that! – Dimitri C. May 27 '10 at 12:41
  • 2
    @Dimitri: Because the default implementation is for reference identity, which is sufficient in many cases. – Jon Skeet May 27 '10 at 12:44
  • 1
    Well, you can use `object` as a key, every `object` you construct will be unique by default: `var key = new object();`, but of course they could've solved that by just creating a new class you use instead, like `HashKey`, which is just `object` with the extra methods. Additionally, every object can be used as a key by itself, even if two objects with the same content aren't considered equal, so that you can use them as keys in lookup tables to find related objects. – Lasse V. Karlsen May 27 '10 at 12:45
5

I think you're quite wrong if you believe that implementing a strict order predicate is much easier to implement than a hash function - it needs to handle a large number of edge cases (null values, class hierarchies). And hash functions aren't that difficult, really.

Community
  • 1
  • 1
Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
4

An AVL tree will be much slower than a hashtable. If you are dealing with only a few items then it will not be much of an issue. Hashtables have O(1) inserts, deletes, and searches, but an AVL tree has O(log(n)) operations.

I would go ahead and override GetHashCode and Equals for two reasons.

  • It really is not that difficult to get a decent distribution by using a trivial XOR implementation.1
  • If your classes are part of a public API then someone else might want to store them in a hashtable.

Also, I have to question the choice of BST. AVL trees are bit out of style of these days. There are other more modern BSTs that are easier to implement and work just as well (sometimes better). If you really do need a data structure that maintains ordering then consider these alternatives.


1The XOR strategy has a subtle associativity problem that can cause collisions in some cases since a^b = b^a. There is a solution from Effective Java that has achieved cult-like recognition which is fairly simple to implement as well.

Brian Gideon
  • 47,849
  • 13
  • 107
  • 150
3

If you use Dictionary or SortedList, and override Equals, you need to have a hash function, else they will break. Equals is also used all over the place in the BCL, and if anyone else uses your objects they will expect GetHashCode to behave sensibly.

Note that a hash function doesn't have to be that complicated. A basic version is to take the hash of whatever member variables you're using for equality, multiply each one with a separate coprime number, and XOR them together.

thecoop
  • 45,220
  • 19
  • 132
  • 189
2

You don't need to implement it. If you write your own Equals() method I'd recommend to use some GetHashCode implementation that doesn't break HashSet though. You could for instance return a static value (typically 42). HashSet performance will degrade dramatically, but at least it will still work - you'll never know who'll use/edit/maintain your code in the future. (edit: you may want to log a warning if such a class is used in a hashed structure in order to early spot performance problems)

EDIT: don't only use XOR to combine hash codes of your properties

It has already been said by others that you may simply combine hash codes of all your properties. Instead of only using XOR I'd encourage multiplying results though. XOR may result in a 0 value if both values are equal (e.g. 0xA ^ 0xA == 0x0). This may be easily improved using 0xA * 0xA, 0xA * 31 + 0xA or 0xA ^ (0xA * 31).

Still, the intent of my answer is that any hash function is better than one that isn't consistent with equals - even if it only returns a static value. Simply select any subset of properties (from none to all) you use for equality and throw the results together. While selecting properties for hash code, prefer those small subsets which combinations are pretty unique (e.g. firstname, lastname, birthday - no need to add the whole address)

sfussenegger
  • 35,575
  • 15
  • 95
  • 119
  • @Rubys no surprise there, really :) – sfussenegger May 27 '10 at 12:58
  • Or even just XOR'ing the hashcodes of the constituent variables is super easy and provides a reasonably good distribution. Like you said you don't necessarily have to have to use a difficult implementation. – Brian Gideon May 28 '10 at 02:28
  • @Brian only XOR isn't the best choice as you might end up with 0 in some cases. see my edit – sfussenegger May 28 '10 at 07:20
  • It will also break down if two different variables can have their properties swapped. For example, an object with values ("hello", "world") will have the same hashcode as ("world", "hello"). The canonical implementation is to multiply by coprimes numbers and forget about XOR altogether, but generally speaking a trivial implementation with XOR is reasonable. – Brian Gideon May 28 '10 at 12:46
1

Coming up with an adequate hash function is not difficult. Most often, a simple XOR of the results from GetHashCode() of all fields is sufficient.

Tormod Fjeldskår
  • 5,952
  • 1
  • 29
  • 47
  • 1
    XOR is bad if hash codes of properties are equal, i.e. if properties themselves are equal. multiplying results with primes before XORing them mitigates the problem, e.g. `hash = (hash1 * 31) ^ hash2` – sfussenegger May 28 '10 at 07:24
1

If you override equals you should override GetHashCode() from MSDN: "It is recommended that any class that overrides Equals also override System.Object.GetHashCode." http://msdn.microsoft.com/en-us/library/ms173147.aspx

The two functions should match in the sense that if two objects are equal they should have the same hash value. That doesn't mean that if two objects have the same hash they should be equal. You don't need an overly complex hash algorithm but it should attempt to distribute well across the integer space.

Craig Suchanec
  • 10,474
  • 3
  • 31
  • 39