1

I just learned that:

This leads me to think, that Dictionaries in .NET (at least when using strings as keys) are susceptible to key collisions.

What happens on such a key collision? Are there any known unique string values, that actually collide? Will the dictionary be broken on those key values?

Additionally:

  • Does it depend, whether the code is running on 32bit or 64bit system?
  • Is using short strings up to a specific lenght safe? Safer?

Note: I am not referring to a specific .NET CLR, but if it matters, let's talk about the 4.5.2 32bit version for the desktop.


Notes about duplicates:

Marcel
  • 15,039
  • 20
  • 92
  • 150
  • 2
    Does [this](https://stackoverflow.com/questions/2975612/what-happens-when-hash-collision-happens-in-dictionary-key) help ? – Zein Makki Jul 21 '17 at 06:47
  • 5
    Your use of "safe" and "susceptible" implies that you think key collisions are problematic. They aren't. They reduce efficiency - if you have a dictionary where every key has the same hash code, a lookup ends up being linear - but they don't make the dictionary misbehave. – Jon Skeet Jul 21 '17 at 06:59
  • _"I am not referring to a specific .NET CLR"_ -- if you want to ask "how likely" a collision is, you have to be talking about a specific implementation of `System.String`, because there are in fact different implementations with (somewhat) different likelihoods of collisions. As demonstrated by the answer below, you can of course discover on your own collisions. The demo uses `System.String`, but of course `System.Int64` also does not provide unique hash codes, so you will get collisions there as well. – Peter Duniho Jul 21 '17 at 07:00
  • Likeliness depends on how the hash algorithm is implemented in the `GethashCode` function, the source code can be checked, see this link https://stackoverflow.com/questions/15174477/how-is-gethashcode-of-c-sharp-string-implemented and yes, string length is a factor. – Cleptus Jul 21 '17 at 07:04
  • 1
    _All_ hash table implementations (not counting pre-computed "perfect hash" data structures) are "susceptible to collisions". That's simply part of the definition of a hash code. See marked duplicate for more discussion on collisions, hash codes, etc. You can inspect specific hash code implementations to gain insight about likelihood of collisions. Note that collisions may affect _performance_ of the hash table implementation, but not the _correctness_ of it. – Peter Duniho Jul 21 '17 at 07:15
  • 1
    The list of example duplicate questions should include the link posted in the first comment to this question. This is the only linked example that specifically addresses how duplicate hash values affect the behaviour of the Dictionary class. https://stackoverflow.com/questions/2975612/what-happens-when-hash-collision-happens-in-dictionary-key – camelCase Jul 21 '17 at 08:08

1 Answers1

6

You can easily generate such collisions (see https://en.wikipedia.org/wiki/Birthday_problem), e.g.

  // key   - computed hash value
  // value - original string
  Dictionary<int, string> hashes = new Dictionary<int, string>();

  for (int i = 0; ; ++i) {
    string st = i.ToString();
    int hash = st.GetHashCode();
    string collision = null;

    if (hashes.TryGetValue(hash, out collision)) {
      Console.Write($"Collision: \"{collision}\" and \"{st}\" hash {hash}");

      break;
    }
    else
      hashes.Add(hash, st);
  }

Outcome (at my workstation .Net 4.6.1 x86):

  Collision: "699391" and "1241308" hash -1612916492

Outcome (at my workstation .Net 4.6.1 recomplied at IA-64):

  Collision: "942" and "9331582" hash -1864841629

So if you want to see a key collision (in x86 mode):

 // Both "699391" and "1241308" keys have the same hash -1612916492
 Dictionary<string, string> demo = new Dictionary<string, string>() {
   {"699391", "abc"},
   {"1241308", "def"},
 };

Finally, String.GetHashCode is inner workings of .Net and it can depend on .Net version, mode (IA64 or x86) etc. There's no guarantee that short strings are free from collisions etc.

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • 1
    I've demonstrated how to find a collistion and since `GetHashCode()` returns `int` the dictionary is `Dictionary`: key is a *hash*, value is a `string` value (e. g. "699391") – Dmitry Bychenko Jul 21 '17 at 06:54
  • Dmitry I believe you have demonstrated that hash collisions can occur within the Dictionary Class however these do not necessarily translate into Key collisions which is the concern expressed in the question. – camelCase Jul 21 '17 at 07:07
  • 1
    @camelCase: you misunderstand the question. It is _not_ asking about key "collisions" (which would mean only an attempt to use the same key for two different values in the dictionary). It is asking about _different_ keys but which have the same hash code and thus have a hash collision internally (something any self-respecting dictionary class will in fact handle transparently to the client code). – Peter Duniho Jul 21 '17 at 07:09
  • @PeterDuniho - Miss understood the question. I thought question was about different strings getting an exception on key already exists even though they have different content.. and not if their hashcodes match. Indeed my answer was wrong in this case :) – Gilad Green Jul 21 '17 at 07:10
  • @camelCase: Comparing keys in `Dictionary` are based on IEqualityComparer comparer https://referencesource.microsoft.com/#mscorlib/system/collections/generic/dictionary.cs,d3599058f8d79be0 which in turn when not provided is based on `GetHashCode` (`EqualityComparer.Default`) – Dmitry Bychenko Jul 21 '17 at 07:12
  • @Dmitry: Does System.String implement Equals(). I believe the questioner is worried about unexpected key collisions at the public interface of Dictionary.Add(..). You are correct about the inner workings of hashvalues but the OP's concern does not translate into random exceptions when adding items to a Dictionary keyed on String. – camelCase Jul 21 '17 at 07:24