3

I was studying Computer Science when I found myself upon a question which I cannot get an answer to. Here's my train of thoughts so far:

  • Hash tables using open addressing need a probing function to resolve collisions, such as linear / quadratic probing, or double hashing.
  • Linear probing is prone to primary clustering on the hash tables which could lead to degraded performance (referencing MIT's algorithm lecture)
  • Found Swift's standard library uses Linear probing for it's Hash table implemention Dictionary (source code)
  • Then also learned that actually linear probing could be more performant(not sure) because of less cache misses (Wikipedia on Linear probing)

Linear probing can provide high performance because of its good locality of reference, but is more sensitive to the quality of its hash function than some other collision resolution schemes.

  • But other laguages like Golang seems to use quadratic probing on it's hash tables

So I'm confused on what's a good probing strategy on hash tables now and why the Swift team went ahead and used linear probing on it's Dictionaries. Any thoughts would be welcome. Thanks.

Nick Kim
  • 377
  • 1
  • 13
  • "why the Swift team went ahead and used linear probing" -- only a member of the Swift team can answer that question. – John Coleman Apr 11 '21 at 11:57
  • Hi John, I do see that the implementation details could not be specifically answered until an Apple's engineer decides to shows up, but I'm merely trying to shed some light on the subject. I totally agree about the probing problem predating the creation of swift and found that NSDictionary also uses the same linear probing method (https://stackoverflow.com/questions/28379809/how-are-hash-collisions-handled) – Nick Kim Apr 11 '21 at 12:07
  • 2
    If they had a good reason, my guess is that it's because linear probing is cache friendly. In a big hash table with small key-value pairs, you can probe several linear cells in the time that it would take to do another random access, so the simple theoretician's analysis misses the mark a bit. Clustering can be controlled with a good hash function (there's been significant improvement since linear hashing was proposed) and by decreasing the max load (which presents a space/time trade-off, though). – David Eisenstat Apr 11 '21 at 13:49

0 Answers0