4

I know the difference between Hashtable and HashMap. However, both these classes seemingly are using a hash function to get the job done. Is there a difference between the hash function used in Hashtable, and the hash function used in HashMap?

In particular, is there a difference between the hashing algorithm they use? What is the formula used to hash in these two classes?

In other words, is the way for calculating index (hash value) different?

A. Mashreghi
  • 1,729
  • 2
  • 20
  • 33
  • Possible duplicate of [Hash : How does it work internally?](http://stackoverflow.com/questions/4453476/hash-how-does-it-work-internally) – Tibrogargan Jul 10 '16 at 00:06
  • 3
    At the end, both are using `hashCode()` of the object beeing added to the collection. So, the algorithm to determine that "hash" depends on how YOU implemented `hashCode()` on the objects beeing added. (Or how its implemented on the object, in general) – dognose Jul 10 '16 at 00:13
  • 1
    Actually, for Java 8 the algorithms are (significantly) different; see my answer. – Stephen C Jul 10 '16 at 01:07
  • @Tibrogargan I don't think this is a duplicate of that question. At least I do not see my answer there. – A. Mashreghi Jul 10 '16 at 04:21

2 Answers2

17

In particular, is there a difference between the hashing algorithm they use? What is the formula used to hash in these two classes?

The primary hash function used when you use an object as a hash table key is the object's hashCode() method. It is up the to the key class to implement a decent hash function.

The Hashtable and HashMap classes take the key's hashcode value and convert it to an index in the primary hashtable array-of-chains. However, there are differences in how this happens between Hashtable and HashMap.

  • For Hashtable (Java 8) the code is this:

     hash = key.hashCode();
     index = (hash & 0x7FFFFFFF) % tab.length;
    
  • For HashMap (Java 8) the code is (effectively) this:

     // (I have restructured the code for ease of comparison.)
     int h;
     hash = (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
     index = (tab.length - 1) & hash;
    

As you can see, HashMap is scrambling the hashcode value returned by the key's hashcode function. This is explained in the source code as follows:

[This method] computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.

Notes:

  1. The & versus % difference is because in Hashtable the hash array size is a prime number, but in HashMap (Java 8) the size is a power of 2.

  2. In Java 8 HashMap, the implementation will turn a long hash chain into a binary tree if the key class implements Comparable.

  3. HashMap handles null keys, but Hashtable doesn't.


However, all of this extra complexity in HashMap only comes into play if your key class has a poorly designed / implemented hashCode() method ... or if someone is deliberately trying to engineer hash collisions.

In other words, if your key class is well designed, the differences should not matter.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
2

java.util.Hashtable<K,V> is like java.util.Vector<T>. It's a class added to the SDK very early in development which has been superseded by HashMap<K,V> (as ArrayList<T> superseded Vector<T>).

So you simply shouldn't use it unless you require implicit synchronization of all the operations, which comes by default with Hashtable, but you can stil use Collections.synchronizedMap for that purpose or a ConcurrentHashMap<K,V>.

As stated in Javadoc:

As of the Java 2 platform v1.2, this class was retrofitted to implement the Map interface, making it a member of the Java Collections Framework. Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use ConcurrentHashMap in place of Hashtable.

The hashing of the two classes should be the same, as they'll both use int Object::hashCode for their purpose.

Jack
  • 131,802
  • 30
  • 241
  • 343
  • For example in ArrayList when the capacity is reached the size of the array is multiplied by 1.5; however, in Vector this ratio is 2. So, they are not totally the same. – A. Mashreghi Jul 10 '16 at 00:10
  • 2
    That's not really important in any serious way. – Louis Wasserman Jul 10 '16 at 00:16
  • @A.Mashreghi that's an implementation detail, why does it concern you? – dimo414 Jul 10 '16 at 00:36
  • 1
    Actually, for Java 8 the algorithms are (significantly) different; see my answer. – Stephen C Jul 10 '16 at 00:59
  • 2
    @A.Mashreghi - Arguably, that is not a difference in the hashing algorithm. Rather it is a difference in the resizing algorithm. – Stephen C Jul 10 '16 at 01:10
  • @A.Mashreghi, I can see you have accepted wrong answer.Stephen C has given correct answer. – Gaurava Agarwal Jul 10 '16 at 02:40
  • @StephenC I was just giving an example of the case that the fact that some class supersedes another does not necessarily mean they have no implementation difference. – A. Mashreghi Jul 10 '16 at 03:59
  • Yes, but it was a bad (i.e. not apropos) example. Jack is clearly talking about "should be the same" with respect to the hashing algorithm. (He's not correct though ...) – Stephen C Jul 10 '16 at 04:06