0

Some programming languages such as python, Java and C++11 have hash tables (although sometimes under different names with extended functionality) as part of their standard library. I would like to understand from a high level algorithmic point of view what has been implemented. Specifically:

  • What function of the keys is used to give the location to place the data (i.e. what is the hash function that is used)?
  • Which algorithms do they use for resolving collisions? As an example, do any of them use simple chaining?
  • Is there any use of randomness to pick the hash functions?
Simd
  • 19,447
  • 42
  • 136
  • 271
  • Can't you just look at an open source implementation? – Carl Norum Sep 26 '13 at 14:57
  • If this is language agnostic, why to tag with Java, C++ and python? – Luiggi Mendoza Sep 26 '13 at 14:59
  • This is very broad. The answers will be different for different languages. For example, I'm fairly sure Python uses open addressing, but Java (according to Peter Lawrey below) uses simple chaining. You seem to be asking for a generalization: "how do modern programming languages implement hash tables, in general." But no generalization can be made. I would suggest focusing on one language (but check for dupes first). – senderle Sep 26 '13 at 15:06
  • 1
    There's a python-specific answer [here](http://stackoverflow.com/questions/327311/how-are-pythons-built-in-dictionaries-implemented). – senderle Sep 26 '13 at 15:07
  • @senderle I was really looking for a list rather than any generalization. I was hoping there might be a "big list" tag but I couldn't find one. – Simd Sep 26 '13 at 15:08
  • In Haskell they don't even use hashes. – Mihai Maruseac Sep 26 '13 at 15:10
  • 1
    @CarlNorum: I doubt anybody could come to any true understanding of the theories behind hash tables by simply reading the source code implementation of one. – John Dibling Sep 26 '13 at 15:14
  • @MihaiMaruseac, haskell `Map`s don't use hashes, but there is at least one `HashTable` implementation (see [here](http://hackage.haskell.org/packages/archive/base/4.5.1.0/doc/html/Data-HashTable.html)). I'm pretty sure it uses hashes. – senderle Sep 26 '13 at 15:14
  • 1
    For anyone interested: this seems like a relevant blog post https://rcoh.me/posts/hash-map-analysis/ – Artem Pelenitsyn Aug 06 '20 at 03:13
  • 1
    @ArtemPelenitsyn That’s a nice link but there is no mention of randomisation which I thought was an important part of modern implementations. – Simd Aug 06 '20 at 07:45

1 Answers1

2

For Java,

How are the hash functions themselves computed?

They are implemented by the class itself with int hashCode()

Which algorithms do they use for resolving collisions? As an example, do any of them use simple chaining?

Typically simple chaining. Java 8 will support trees for collisions of String.

Is there any use of randomness to pick the hash functions?

No, except for String elements/keys to avoid DOS attacks.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Thank you. What is the mathematical function that is being computed to give the location to place the data? What I mean by this is that if the key is x, say, then one option might a*x +b mod p mod m where p is a large prime, a and b are between 0 and p-1 and m is the size of the table. – Simd Sep 26 '13 at 15:10
  • 1
    @Raphael It doesn't use a prime number as `%` is expensive. Instead it uses this function `h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4);` and then takes the lowest bits. i.e. a mask. – Peter Lawrey Sep 26 '13 at 15:26
  • @Raphael You might find this interesting http://stackoverflow.com/questions/19032197/are-there-different-ways-to-calculate-the-table-index-in-hashmap/ – Peter Lawrey Sep 26 '13 at 15:33