6

I've been reading quite a lot about this interesting topic (IMO). but I'm not fully understand one thing :

Dictionary size is increasing its capacity ( doubles to the closest prime number) to a prime number (when reallocation) : because :

int index = hashCode % [Dictionary Capacity];
  • So we can see that prime numbers are used here for [Dictionary Capacity] because their GreatestCommonFactor is 1. and this helps to avoid collisions.

In addition

I've seen many samples of implementing theGetHashCode() :

Here is a sample from Jon Skeet :

public override int GetHashCode()
{
    unchecked 
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

I don't understand :

Question

Does prime numbers are used both in : Dictionary capacity and in the generation of getHashCode ?

Because in the code above , there is a good chance that the return value will not be a prime number [please correct me if i'm wrong] because of the

  • multiplication by 23
  • addition of the GetHashCode() value for each field.

For Example : (11,17,173 are prime number)

        int hash = 17;
        hash = hash * 23 + 11; //402
        hash = hash * 23 + 17; //9263
        hash = hash * 23 + 173 //213222
        return hash;

213222 is not a prime.

Also there is not any math rule which state :

(not a prime number) + (prime number) = (prime number)

nor

(not a prime number) * (prime number) = (prime number)

nor

(not a prime number) * (not a prime number) = (prime number)

So what am I missing ?

Royi Namir
  • 144,742
  • 138
  • 468
  • 792
  • where did you see this GetHashCode implementation ? – Tigran Feb 20 '13 at 16:04
  • @Tigran http://stackoverflow.com/a/263416/859154 – Royi Namir Feb 20 '13 at 16:04
  • 1
    I never read anywhere that hash codes should be prime, or even that it is better if they are prime - what they should be is as evenly distributed as possible over their entire range. – MiMo Feb 20 '13 at 16:15
  • See Eric Lippert's answer in http://stackoverflow.com/questions/5331842/how-likely-is-it-to-get-a-hashcode-collision-with-this-hashcode-function – MiMo Feb 20 '13 at 16:19

1 Answers1

8

It does not matter what the result of GetHashCode is (it does not have to be prime at all), as long as the result is the same for two objects that are considered to be equal. However, it is nice (but not required) to have GetHashCode return a different value for two objects that are considered to be different (but still not necessarily prime).

Given two numbers a and b, when you multiply them you get c = a * b. There are usually multiple different pairs of a and b that give the same result c. For example 6 * 2 = 12 and 4 * 3 = 12. However, when a is a prime number, there are a lot less pairs that give the same result. This is convenient for the property that the hash code should be different for different objects.

In the dictionary the same principle applies: the objects are put in buckets depending on their hash. Since most integers do not divide nicely by a prime number, you get a nice spreading of your objects in the buckets. Ideally you'd want only one item in each bucket for optimal dictionary performance.


Slightly off-topic: Cicada's (that's an insect) use prime numbers to determine after how many years they go and mate again. Since this mating cycle is a prime number of years, the chances of the mating continously coinciding with the life cycles of any of its enemies are slim.

Daniel A.A. Pelsmaeker
  • 47,471
  • 20
  • 111
  • 157
  • @Virtlink: + 1 bit me on cicadas, didn't know that. Absolutely out of topic, but straordinary beautiful. Already posted on G+. – Tigran Feb 20 '13 at 16:20
  • @RoyiNamir: Humans like to make parallels between their knowledge and observations arround them (look on Dan Brown: Da Vinci Code, for example), but this one is actualy the *nature* that leads us to that conclusion. Is it correct or not, noone knows. – Tigran Feb 20 '13 at 16:25
  • @Virtlink -(offtopic) Isn't a fly live for a week max ? how come _years_ got in here ? – Royi Namir Feb 20 '13 at 16:32
  • @RoyiNamir Apparently this species (which isn't technically a fly) lives for years. But don't ask be about specifics, I only read it in the linked article. I am mostly human, partly programmer, and certainly not an entomologist. :P – Daniel A.A. Pelsmaeker Feb 20 '13 at 16:36
  • You said: *most integers do not divide nicely by a prime number*. And I say: for each prime `P`, there is the same or less number of integers that "divide nicely" by `P+1`. – Dmytro Shevchenko Oct 19 '13 at 08:41