8

Given two different strings, is it always the case that s.GetHashCode() != s1.GetHashCode()?

Is it the case that the number of distinct integers is less than the number of distinct strings?

p.campbell
  • 98,673
  • 67
  • 256
  • 322
william007
  • 17,375
  • 25
  • 118
  • 194

3 Answers3

16

No. Just as a simple thought experiment: How many strings are there (hint: many more than 232 and thus how many unique hash codes can there be (hint: 232. See the problem?)

Hash codes are just required to be equal whenever Equals returns that both objects are equal. Furthermore, whenever two hash codes are not equal, then the objects themselves cannot be equal. There is no further requirement, but they should be well-distributed so that hash tables can perform well. So basically it's:

enter image description here

Note the omission of the respective ⇐ variants. It's not an equivalence, just two implications.

To quote the documentation:

A hash function must have the following properties:

  1. If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

  2. The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

  3. For the best performance, a hash function must generate a random distribution for all input.

Community
  • 1
  • 1
Joey
  • 344,408
  • 85
  • 689
  • 683
  • 1
    The problem you allude to in your openning line is known as the [pigeon hole principle](http://en.wikipedia.org/wiki/Pigeonhole_principle) - more pigeons than you have pigeon holes. – RJFalconer Jun 01 '15 at 09:10
  • 1
    I know, but apparently not every reader might, admittedly. I edited it in. – Joey Jun 01 '15 at 10:34
7

To add to @Joey's statement you principally cannot have the hashcodes always be unequal.

There are 2^32 possible hash codes but infinite input strings.

Hash collisions are guaranteed to happen with enough (2^32 + 1) input values.

In fact, hash collisions are much more common than one might think due to the Birthday Problem. When I did the math a while back for a system that used 64 bit hash codes (which have way more possible hash values than 32-bit hash codes, not just double as one might naively think), with 100 million input values it was very possible that there would be at least 1 hash collision. I think the probability was around 1%.

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • not guaranteed, but on average, there is 1 collision per (2^32 + 1) input values – fjch1997 Dec 26 '20 at 03:06
  • 1
    @fjch1997 i doubt about that. If hash function is used to calculate hash from string data then birthday paradox would take over well before the point you highlight. If my calculation is right you shoud have at least 50% probability of collision given 2^16 inputs (assuming hash space is 2^32) regardless of the hash function : https://en.wikipedia.org/wiki/Birthday_problem (that said if you have (2^32) + 1 inputs, then you matematically get certainty that a collision happened. Is like to say you have 4 coin and three wallet, surely a wallet will contain two coin. – Skary Dec 26 '20 at 10:23
  • @Skary You're right. Thanks for the explaination! – fjch1997 Dec 26 '20 at 15:22
1

As far i know Object.GetHashCode() does not provide an hash function over the object (so i suppose Joey's consideration is not correct in this case), it only return an unique index assigned to the object by CLR when the object is created and released when the object is garbage collected.

So you can not have an hashcode duplicate (in the same AppDomain) in a given moment, but you could have a douplicate over the time (the same index may be assigned more than once during application execution).

The question is also discussed here : Default implementation for Object.GetHashCode()

Community
  • 1
  • 1
Skary
  • 1,322
  • 1
  • 13
  • 40