2

I am developing a game where every thing in the game world is represented by an global unique identifier.

Those ids each measure 64 bits and are generated by hashing together the time of creation, the machines network address and a random number. According to Wikipedia's article on the Birthday problem, the probability of a hash collision is 0.1% for two hundred million records.

Since it is unlikely that I'm going to get that much records, one could consider that no hash would ever collide. But I don't want to hope for that, but let my application handle the rare case of a id collision, thus hash collision.

Otherwise, the behavior would be very undesired because two independent things in the game world would have a connection, thus share their properties like position, movement, health points, and so on.

How can I handle hash collisions? How are they handled typically?

danijar
  • 32,406
  • 45
  • 166
  • 297
  • Actually, people usually _do_ assume that GUIDs never collide. – C. K. Young Aug 29 '13 at 00:27
  • 1
    It sounds that what you want is not really a hash, but a unique identifier. Is there any reason not to use a 128-bit GUID? – bara Aug 29 '13 at 00:27
  • @bara I would like to use C++ standard types like `unsigned long long int` instead of arrays to store the id. Moreover, I don't have that much records. But anyway, the question remains for any id length. – danijar Aug 29 '13 at 18:21
  • @danijar Then I would go back to the question of "why use a hash?" What you really want for this is a unique id, unless there is a reason not to do so (say the ids are generated in a distributed way). – bara Aug 29 '13 at 21:36
  • @bara I need ids, right. But since data can be loaded from different independent machines (savegame, modifications, patches, addons) this id must be globally unique. So it is kind of a hash, I guess, right? – danijar Aug 30 '13 at 07:51
  • Then you should go with 128-bits generated with a GUID it will save you a lot of pain down the road. Use an algorithm like RFC 4122 (your system should have the API already implemented (e.g. CoCreateGuid in Windows). And as Guffa says in his answer, there are other things that will break you before you get a collision. – bara Sep 04 '13 at 06:22

1 Answers1

2

Typically hash collisions are handled in two ways:

  1. Use a larger hash, so that collisions are practically impossible.

  2. Consider hash codes to be non-unique, and use an equality comparer for the actual data to determine uniqueness.

A 128 bit GUID uses the first method. The HashSet<T> class in .NET is an example of the second method.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • 1
    What if a 128 bit GUID collides? – danijar Aug 29 '13 at 18:23
  • 2
    @danijar: For 26,000 million records, the probability for that is 0.0000000000000001%. That is roughly the same probability as for a bit of the GUID to change spontaneously in the memory circuit. – Guffa Aug 29 '13 at 19:27
  • I still cannot use 128 bits. That is for multiple reasons. For example stl containers like `std::unordered_map` do only accept hashes of `std::size_t` which measures 64 bit. Using larger ids would require hashing them again for smaller map hashes, which is useless since this is the main use case. Could you please elaborate on how to `consider hash codes to be non-unique`? – danijar Sep 04 '13 at 08:46
  • @danijar: The second option is just accepting that there may be hash collisions. For the rare case when hashes actually collide, you simply compare the actual data to see if it's the same value. – Guffa Sep 04 '13 at 12:23
  • Okay and then I would have to scan through the loaded data and update all references to the collided hash to a new generated one. May actual question was about this process, but I think this is so implementation dependent that it would be too localized on SO. – danijar Sep 04 '13 at 14:05