1

In the Java 11 String's hashCode method a local integer variable "h" is used to store the hash field's value.

Is this done for performance reasons, or a mere stylistic choice? An answer to a similar question says it's for thread safety, but Strings are immutable so it doesn't seem the case

  • I reopened the question, because the [duplicated question](https://stackoverflow.com/questions/10576939/java-string-hashcode-caching) does not answer this particular question, which is about the use of the local variable `h`. – MC Emperor Jul 10 '23 at 09:45
  • 1
    Well if there is an reassignment to the non-final field ```hash``` within the ```hashCode``` method, then String is not immutable enough. That's why they changed it. – geanakuch Jul 10 '23 at 09:56

2 Answers2

3

The git blame for that method refers to JDK-8166842.

From that issue, the reasoning was

Latest change to JDK 9 String code introduced a non-benign data race:

public int hashCode() {
   if (hash == 0 && value.length > 0) {
       hash = isLatin1() ? StringLatin1.hashCode(value)
                         : StringUTF16.hashCode(value);
   }
   return hash;
}

The 'hash' field should only be read once into a local variable. The second racy read (at return) can read 0 while the 1st (at if) can read non-zero.

Holloway
  • 6,412
  • 1
  • 26
  • 33
  • Good example to attend for a "pure function" that should return merely a value without side-effects and for performance reason, change inner state of the object. – PeterMmm Jul 10 '23 at 09:57
  • 1
    How's possible that the first hash is non-zero but the second one is? If I understood correcly only a non-zero value may be assigned to hash, so the worst case seems to be simply calculating the hash more than once – Massimiliano Micol Jul 10 '23 at 10:46
  • It's a good question and I'm not really sure of the mechanics of it. It's also worth noting that the hashCode method [has changed again](https://github.com/AdoptOpenJDK/openjdk-jdk13u/blob/8da90d672369a305b1dd7ad7cad0110a8d683d40/src/java.base/share/classes/java/lang/String.java#L1536) since the `h` variable was added to include a `hashIsZero` field to avoid recalculating 0 hashes. – Holloway Jul 10 '23 at 12:43
  • 1
    Because the memory model allows the 2 reads to be reordered, which means that the second read can actually happen before the first one. – Quân Anh Mai Jul 20 '23 at 08:17
0

While Holloway has answered the question itself, I'd like to clear up a minor misconception.

String is only immutable in the sense that it doesn't expose functionality for mutating it. However, it still does have an internal mutable state, specifically the hash field.

Calculating the hash of a String is a relatively simple O(n) operation. Not disastrous, but also not completely free, especially not for longer strings.

Since strings are a fundamental part of almost all aspects of Java and used basically everywhere, including in Java-internal functionality, ensuring good performance for the implementation of String is critical. Even a minor inefficiency may be enormously multiplied simply from how incredibly often it gets run. As such, while the hash calculation isn't too bad, it's still something we'd like to not have to do unless it's absolutely necessary.

Now, there are three ways we could go about calculating the hash:

  1. We calculate it every time hashCode() is called. In other words, we only calculate it when we need it. However, that also means we have to recalculate it every single time it's needed. This would be disastrous for HashMap and similar.
  2. Calculate it once when we create the string, then return the same value every time hashCode() is called. This makes hashCode() free, but makes the creation of a string more expensive for the situations where the hash isn't needed. Since a lot of strings are constantly created in Java, that's a non-negligible.
  3. We do it lazily, i.e. we calculate it the first time hashCode() is called and then remember it for subsequent calls to hashCode().

The latter option was chosen since it finds that sweet spot between "don't calculate it unless you need it" and "don't calculate it more than once". This does mean that String has a mutable state; it updates its internal cache of the hash code. However, from an outside observer, this effect is invisible (aside from the fact that the first call to hashCode() is slightly slower), meaning that the string is effectively immutable.

BambooleanLogic
  • 7,530
  • 3
  • 30
  • 56