38

So I read about HashMap. At one point it was noted:

"Immutability also allows caching the hashcode of different keys which makes the overall retrieval process very fast and suggest that String and various wrapper classes (e.g., Integer) provided by Java Collection API are very good HashMap keys."

I don't quite understand... why?

Prince
  • 20,353
  • 6
  • 39
  • 59
Toskan
  • 13,911
  • 14
  • 95
  • 185
  • 1
    Since changing your hashcode while being stored in a HashMap will result in losing access to that key in that HashMap anyway and you can cache hashcode of mutable objects too I find that sentence missleading and somehow wrong – stryba Apr 26 '12 at 23:54
  • 5
    I strongly suggest you find a better source of information, such as the Javadoc and the Java Collections Framework document at http://docs.oracle.com/javase/6/docs/technotes/guides/collections/index.html. That blog contains numerous errors of fact, grammar, and spelling. The sentence you have quoted here is entirely illerate, and the apparent implication that HashMap caches hashcodes is not correct. – user207421 Apr 27 '12 at 00:04
  • @EJP: I was thinking the same. Another apparent implication is that immutability is *required* for caching hash codes, which is also not the case. – Niklas B. Apr 27 '12 at 00:15
  • 3
    Your article link is really just a junky ad-laden spam site. I agree with the other comments that the technical merits of the article is questionable. – Steve Kuo Apr 27 '12 at 00:21

7 Answers7

36

String#hashCode:

private int hash;

...

public int hashCode() {
    int h = hash;
    if (h == 0 && count > 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Since the contents of a String never change, the makers of the class chose to cache the hash after it had been calculated once. This way, time is not wasted recalculating the same value.

Jeffrey
  • 44,417
  • 8
  • 90
  • 141
  • 12
    Just for the record: if the string were mutable, and you used it in a `HashMap`, then after you modified the string, the hash table would become corrupted and more or less unusable. The hash table doesn't know to move the entry to its new position based on the new hash code. – Louis Wasserman Apr 27 '12 at 01:58
  • Should the hash field have the `transient` modifier? – Andreas May 07 '15 at 20:28
  • @Andreas Nope. I was quoting from the JDK's `String` class, and it doesn't have a `transient` modifier. – Jeffrey May 08 '15 at 03:49
10

Quoting the linked blog entry:

final object with proper equals () and hashcode () implementation would act as perfect Java HashMap keys and improve performance of Java hashMap by reducing collision.

I fail to see how both final and equals() have anything to do with hash collisions. This sentence raises my suspicion about the credibility of the article. It seems to be a collection of dogmatic Java "wisdoms".

Immutability also allows caching there hashcode of different keys which makes overall retrieval process very fast and suggest that String and various wrapper classes e.g Integer provided by Java Collection API are very good HashMap keys.

I see two possible interpretations of this sentence, both of which are wrong:

  • HashMap caches hash codes of immutable objects. This is not correct. The map doesn't have the possibility to find out if an object is "immutable".
  • Immutability is required for an object to cache its own hash code. Ideally, an object's hash value should always just rely on non-mutating state of the object, otherwise the object couldn't be sensibly used as a key. So in this case, too, the author fails to make a point: If we assume that our object is not changing its state, we also don't have to recompute the hash value every time, even if our object is mutable!

Example

So if we are really crazy and actually decide to use a List as a key for a HashMap and make the hash value dependent on the contents, rather than the identity of the list, we could just decide to invalidate the cached hash value on every modification, thus limiting the number of hash computations to the number of modifications to the list.

Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • You can use mutable keys for a `HashMap` as long as you ensure they are never modified. However, mutable objects cannot reliably cache their `hash` value, so time will be wasted upon every access to the map. – Jeffrey Apr 26 '12 at 23:52
  • @Jeffrey: If the data structure is never modified, we can easily design it so that we never need to compute the hash more than once. Check my edit. – Niklas B. Apr 26 '12 at 23:54
  • If you design a data structure so that it is never modified, you are designing an immutable data structure. If you have to recalculate the hash of a mutable object every time you modify it, the overhead could become worse than just recalculating it every time you called `hashCode`. – Jeffrey Apr 26 '12 at 23:59
  • @Jeffrey: Every sane implementation would just compute `hashCode` on demand and set a flag to signal that the hash code has been computed. Next time the structure is modified, the flag is reset, so that `hashCode` knows that it has to recompute next time. I don't know if this is the way Java does it, but lots of other languages work this way. It does *not* introduce overhead and still doesn't require the hash value to be recomputed every single time. The assumption that immutability is required for this kind of caching is just plain wrong. – Niklas B. Apr 27 '12 at 00:02
  • Sorry, must have misread what you wrote. I could've sworn you were talking about recalculating the hash on every modification. Your idea is sound, +1 – Jeffrey Apr 27 '12 at 00:06
6

It's very simple. Since an immutable object doesn't change over time, it only needs to perform the calculation of the hash code once. Calculating it again will yield the same value. Therefore it is common to calculate the hash code in the constructor (or lazily) and store it in a field. The hashcode function then returns just the value of the field, which is indeed very fast.

Roland Illig
  • 40,703
  • 10
  • 88
  • 121
1

Basically immutability is achieved in Java by making the class not extendable and all the operations in the object will ideally not change the state of the object. If you see the operations of String like replace(), it does not change the state of the current object with which you are manipulating rather it gives you a new String object with the replaced string. So ideally if you maintain such objects as keys the state doesn't change and hence the hash code also remains unchanged. So caching the hash code will be performance effective during retrievals.

raddykrish
  • 1,866
  • 13
  • 15
1

Think of the hashmap as a big array of numbered boxes. The number is the hashcode, and the boxes are ordered by number.

Now if the object can't change, the hash function will always reproduce the same value. Therefore the object will always stay in it's box.

Now suppose a changeable object. It is changed after adding it to the hash, so now it is sitting in the wrong box, like a Mrs. Jones which happened to marry Mister Doe, and which is now named Doe too, but in many registers still named Jones.

user unknown
  • 35,537
  • 11
  • 75
  • 121
1

Immutable classes are unmodifiable, that's why those are used as keys in a Map.

For an example -

    StringBuilder key1=new StringBuilder("K1");
    StringBuilder key2=new StringBuilder("K2");
    
    Map<StringBuilder, String> map = new HashMap<>();
    map.put(key1, "Hello");
    map.put(key2, "World");
    
    key1.append("00");
    System.out.println(map); // This line prints - {K100=Hello, K2=World}

You see the key K1 (which is an object of mutable class StringBuilder) inserted in the map is lost due to an inadvertent change to it. This won't happen if you use immutable classes as keys for the Map family members.

0

Hash tables will only work if the hash code of an object can never change while it is stored in the table. This implies that the hash code cannot take into account any aspect of the object which could change while it's in the table. If the most interesting aspects of an object are mutable, that implies that either:

  • The hash code will have to ignore most of the interesting aspects of the object, thus causing many hash collisions, or...

  • The code which owns the hash table will have to ensure that the objects therein are not exposed to anything that might change them while they are stored in the hash table.

If Java hash tables allowed clients to supply an EqualityComparer (the way .NET dictionaries do), code which knows that certain aspects of the objects in a hash table won't unexpectedly change could use a hash code which took those aspects into account, but the only way to accomplish that in Java would be to wrap each item stored in the hashcode in a wrapper. Such wrapping may not be the most evil thing in the world, however, since the wrapper would be able to cache hash values in a way which an EqualityComparer could not, and could also cache further equality-related information [e.g. if the things being stored were nested collections, it might be worthwhile to compute multiple hash codes, and confirm that all hash codes match before doing any detailed inspection of the elements].

supercat
  • 77,689
  • 9
  • 166
  • 211