5

Possible Duplicate:
what is hashCode use for? is it unique?

I am generating a lot of strings, then my question is:

Can 2 different string have the same hash code in C# ?

By hash code I mean:

string s = "Hello";
s.GetHashCode();

My question is more about of the algorithm that C# follow to geneate the strings, maybe the collisions come when all the other hash code are generated already or maybe not. Is possible that somebody have this answer.

Community
  • 1
  • 1
  • 1
    Everything can have the same hash code since they are limited. – Tim Schmelter Oct 26 '12 at 19:14
  • Yes, and it is why for object like a string, which can have more combination than the number of potential hash, once you find two object with the same hash, you have to compare them in the old standard way to be sure they are not colliding. – LightStriker Oct 26 '12 at 19:16
  • This is interesting to read: http://stackoverflow.com/questions/735317/hashtable-dictionary-collisions – Luc Morin Oct 26 '12 at 19:17

4 Answers4

21

Yes. Hash codes are not unique. There are 2^32 (4,294,967,296) possible hash codes (one for each integer value in a 32 bit integer). There are effectively an infinite number of possible strings. Clearly it's impossible for each of an infinite number of strings to have a different number of finite numbers.

Two different strings (or any values for that matter) having the same hash code is called a "collision". A good hashing algorithm will attempt to ensure that collisions are minimized to the greatest extent possible (although they can't be eliminated). Often this will be dependent on the actual types of data in practice; in this case of strings this means that strings that are similar, or of a similar size, should (ideally) be less prone to collisions.

I assume that you're asking because you're considering using a string's hash code as a unique identifier for a string. Don't do that.

Here is a link that goes into more detail about hash codes in general, if you're interested.

Servy
  • 202,030
  • 26
  • 332
  • 449
6

In general you should expect a hash collision once you have as many elements as the square root of the size of the hash space http://en.wikipedia.org/wiki/Birthday_problem

For a 32 bit hash, you should expect your first collision around the 65k element. This is of course statistical, so you can't predict exactly when it will happen, but it's useful for intuition. If you have 10 strings, you probably don't need to worry about collisions, if you have 100k you definitely do.

BostonJohn
  • 2,631
  • 2
  • 26
  • 48
  • Or be unlucky, and get it in far less combination. It's all about luck. – LightStriker Oct 26 '12 at 19:15
  • The probability doesn't matter much. The Pigeonhole principle allows a better argument. –  Oct 26 '12 at 19:16
  • @delnan Probability matters. For example a cryptographic 256 bit hash has collisions, but you can write software relying on the fact that this will never happen. – CodesInChaos Oct 26 '12 at 19:18
  • This also assumes a well designed hash algorithm. If you're using a type who's hash code methods is defined as `return 0;` it doesn't hold up so well. – Servy Oct 26 '12 at 19:21
  • @CodesInChaos It doesn't matter much in the context of this question. Obviously, once you get to the point where you choose a way of dealing with collisions, the likelihood becomes relevant. But one can do quite a few things with hashes that still work when one ignores collisions. –  Oct 26 '12 at 19:25
1

It depends on the hashing functions, and which algorithm it is using.

In general, some hashing techniques can map one input (the value you want to hash) to one output (the hashed value), while others may map two different inputs to the same output, the later is called Collision http://en.wikipedia.org/wiki/Collision_(computer_science)

For example, if a hash function codes 100 users' names to the numbers 0-9, we gonna have a lot of collisions.

Back to GetHashCode();

Refer to these two articles on MSDN:

http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/

This one explains the function, here is a quote from the bottom of it, check the first bullet:

GetHashCode is designed to do only one thing: balance a hash table. Do not use it for anything else. In particular:

  • It does not provide a unique key for an object; probability of collision is extremely high.
  • It is not of cryptographic strength, so do not use it as part of a digital signature or as a password equivalent
  • It does not necessarily have the error-detection properties needed for checksums.

Here is more explanation:

http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
0

The simple answer is "Yes". With hash codes you always have the chance of collision.

pstrjds
  • 16,840
  • 6
  • 52
  • 61