7

I don't understand how a compiler can be smart enough to construct an O(1) lookup for MyObject where I can put anything inside

public class MyObject
{
    // ... 
}

I understand how this can be done for a limited number of non-primitives such as

public class MyObject
{
    int i { get; set; }
    char c { get; set; }
}

but how can it possibly know how to do this for any implementation of MyObject?

leppie
  • 115,091
  • 17
  • 196
  • 297
  • 2
    It is done by turning an object into a number. Which then serves as its index in the collection. Google "hash function" and read your favorite C# language book about Object.GetHashCode(). – Hans Passant Nov 20 '15 at 17:26
  • I don't think this is close enough to mark it as a duplicate but I go in to a in depth explination in to how Dictionary (and HashSet) use hash codes to get O(1) lookups [in this answer](http://stackoverflow.com/questions/28593764/gethashcode-method-with-dictionary-and-hashset/28593852#28593852) – Scott Chamberlain Nov 20 '15 at 17:29
  • 1
    Protip: If you create a `struct` all of this is done for free :D – leppie Nov 20 '15 at 17:56
  • No you don't understand how it can be done for a limited number of non-primitives. You will get 0(1) lookup with your MyObject. The problem is the two MyObject with identical i can c will not be equal. They will be two entries in the HashSet. – paparazzo Nov 20 '15 at 18:13

2 Answers2

7
  1. Get the hash-code
  2. Modulo that down to produce an index into an array.
  3. Look there. If an item is present see if it is equal.

So far, perfectly O(1). It falls down if a lot of items end up having hash-codes that modulo-down to the same index. This happening a bit is expected and dealt with, but if it happens all the time you end up with O(n) behaviour (and with really bad constant costs).

All objects by default have a GetHashCode() and an Equals() based on reference identity (that is, they are only equal to themselves). Overriding those changes the concept of equality it has, and hence you must always change GetHashCode() when you change Equals() (all objects that are equal must have equal hash codes). You can also enforce the use of a different concept of equality by using an IEqualityComparer<T> implementation which provides a different GetHashCode() and Equals() to use.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
6

Each object has a Hash Code associated with it. There is a method GetHashCode (defined as virtual in the base object class) that has to be overridden in the class so that HashSet can work properly.

A hash code is a numeric value that is used to insert and identify an object in a hash-based collection such as the Dictionary class, the Hashtable class, or a type derived from the DictionaryBase class. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality.

With your current class, it will not work properly (since GetHashCode is not overridden). The comparison for equality will be done on the basis of reference instead of actual values.

Habib
  • 219,104
  • 29
  • 407
  • 436
  • 2
    It will work with their class, just with the default identity-based sort of equality. – Jon Hanna Nov 20 '15 at 17:30
  • Ah, ok. The "has to be overridden" is what I didn't know. I assumed that everything that inherits from `object` has a formula already, that there was some formula generic enough to work on anything that extends `object` –  Nov 20 '15 at 17:34
  • @JonHanna, yes it will be based on reference equality, and not the actual values. – Habib Nov 20 '15 at 17:40