Instructions reordering in Java JVM

Question

I was reading this blogpost.

And the author was talking about breaking the hashCode() in String in multithread environment.

By having:

public int hashCode() {
     int h = hash;
     if (h == 0) {
         int off = offset;
         char val[] = value;
         int len = count;

         for (int i = 0; i < len; i++) {
             h = 31*h + val[off++];
         }
         hash = h;
     }
     return h;
 }

Changed to:

public int hashCode() {
     if (hash == 0) {
         int off = offset;
         char val[] = value;
         int len = count;

         int h = 0;
         for (int i = 0; i < len; i++) {
             h = 31*h + val[off++];
         }
         hash = h;
     }
     return hash;
 }

Which the author says and I quote:

"What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!"

So further going through comments, someone says it can be reordered to

int h = hash;
if (hash == 0) {
  ...
}
return h;

How is that possible? I thought reordering only involves moving program statements up and down. What rules is it following? I've Googled, read the JSR133 FAQ, checked the Java Concurrency in Practice book, but I can't seem to find a place that helps me to understand particularly on reordering. If anyone can point me to the right direction, I would really appreciate it.

Thanks to Louis clarifying the meaning of "Reordering", I wasn't thinking in terms of "byteCode"

However, I still don't understand why is it allowed to move the 2nd Read to the front, this is my naive attempt to translate it to somewhat "bytecode" format.

For simplification purpose, operations that are used to calculate the hashcode are express as calchash(). Therefore, I express the program as:

if (hash == 0)  {       
    h = calchash();
    hash = h;
}
return hash;

And my attempt to express it in "bytecode" form:

R1,R2,R3 are in the operands stack, or the registers
h is in the array of local variables

In program order:

if (hash == 0)  {       ---------- R1 = read hash from memory (1st read)
                        ---------- Compare (R1 == 0)
    h = calchash();     ---------- R2 = calchash()
                        ---------- h = R2 (Storing the R2 to local variable h)
    hash = h;           ---------- Hash = h (write to hash)
}
return hash             ---------- R3 = read hash from memory again(2nd read)
                        ---------- return R3

Reordered transformation (My version based on comments):

                        ---------- R3 = read hash from memory (2nd read) *moved*
if (hash == 0)  {       ---------- R1 = read hash from memory (1st read)
                        ---------- Compare (R1 == 0)
    h = calchash();     ---------- R2 = calchash()
                        ---------- h = R2 (Storing the R2 to local variable h)
    hash = h;           ---------- hash = h (write to hash)
}
return hash             ---------- return R3

Checking the comments again, I found this answered by the author:

Reordered Transformation (From the blog)

r1 = hash;
if (hash == 0) {
  r1 = hash = // calculate hash
}
return r1;

This case actually works on single thread, but it's possible to fail with multiple threads.

It seems that the JVM are making simplifications based on

h = hash and it simplifies the use of R1, R2, R3 to single R1

Therefore, JVM does more than reordering instructions, it also seems reducing the amount of registers being used.

The "things that can be reordered" aren't lines of Java code -- think, for example, statements in the bytecode. (In that case, the read of `hash` and the return statement would be separate statements that could be reordered independently.) — Louis Wasserman, Sep 23 '12 at 17:45
@LouisWasserman Thanks, that was one of major point i was confused about. But something still doesn't feel right. Could the JVM do more than just "reordering" instruction? I'm guessing if there's something like a=b=c=1, would it possible for JVM to do shortcut and treat a,b,c as a single field = 1 instead of 3 variables? — HBZ, Sep 24 '12 at 06:46
google's blogs help tell country of origin a poster is once again. — bestsss, Mar 17 '13 at 23:17

assylias · Accepted Answer · 2013-05-29T11:40:06.420

14

In your modified code:

public int hashCode() {
     if (hash == 0) { // (1)
         int off = offset;
         char val[] = value;
         int len = count;

         int h = 0;
         for (int i = 0; i < len; i++) {
             h = 31*h + val[off++];
         }
         hash = h;
     }
     return hash; // (2)
 }

(1) and (2) could be reordered: (1) could read a non null value while (2) would read 0. That can't happen in the actual implementation in the String class because the calculation is made on the local variable and the return value is also that local variable, which, by definition, is thread safe.

The issue is that the Java Memory Model provides no guarantee when a shared variable (hash) is accessed without proper synchronization - in particular it does not guarantee that all executions will be sequentially consistent. Had hash been volatile, there would be no problem with the modified code.

ps: the author of that blog, I believe, is one of the writers of the Chapter 17 (Java Memory Model) of the JLS - so I would tend to believe him anyway ;-)

UPDATE

Following the various edits / comments - let's look at the bytecode in more details with these two methods (I assume that the hashcode is always 1 to keep things simple):

public int hashcode_shared() {
    if (hash == 0) { hash = 1; }
    return hash;
}

public int hashcode_local() {
    int h = hash;
    if (h == 0) { hash = h = 1; }
    return h;
}

The java compiler on my machine generates the following bytecode:

public int hashcode_shared();
   0: aload_0                           //read this
   1: getfield      #6                  //read hash (r1)
   4: ifne          12                  //compare r1 with 0
   7: aload_0                           //read this
   8: iconst_1                          //constant 1
   9: putfield      #6                  //put 1 into hash (w1)
  12: aload_0                           //read this
  13: getfield      #6                  //read hash (r2)
  16: ireturn                           //return r2

public int hashcode_local();
   0: aload_0                           //read this
   1: getfield      #6                  //read hash (r1)
   4: istore_1                          //store r1 in local variable h
   5: iload_1                           //read h
   6: ifne          16                  //compare h with 0
   9: aload_0                           //read this
  10: iconst_1                          //constant 1
  11: dup                               //constant again
  12: istore_1                          //store 1 into h
  13: putfield      #6                  //store 1 into hash (w1)
  16: iload_1                           //read h
  17: ireturn                           //return h

In the first example, there are 2 reads of the shared variable hash: r1 and r2. As discussed above, because there is no synchronization and the variable is shared, the Java Memory Model applies and a compiler/JVM is allowed to reorder the two reads: line #13 could be inserted before line #1*.

In the second example, all the operations on h, the local variable, need to be sequentially consistent because because of intra-thread semantics and program order guarantee on non-shared variables.

Note: as always, the fact that the reordering is allowed does not mean it will be performed. It is actually unlikely to happen on current x86/hotspot combinations. But it could happen on other current or future architectures/JVM.

*That is a bit of a shortcut, what could happen in practice is that the compiler might rewrite hashcode_shared like this:

public int hashcode_shared() {
    int h = hash;
    if (hash != 0) return h;
    return (hash = 1);
}

The code is strictly equivalent in a single threaded environment (it will always return the same value as the original method) so the reordering is allowed. But in a multi-threaded environment, it is clear that if hash is changed from 0 to 1 by another thread between the first two lines, this reordered method will incorrectly return 0.

edited May 29 '13 at 11:40

answered Sep 23 '12 at 17:39

assylias

321,522
82
660
783

Thanks, I edited my question and tried to reorder (1) and (2). However, it doesn't seems to make sense even in single thread. Would you mind having a look in it? – HBZ Sep 24 '12 at 06:57
In your edit, you write: *"Isn't hash dependent on the statements before it"* => yes it is and would be run after those statements in a sequentially consistent execution. But without synchronization, you don't know that the execution will be sequentially consistent. Which is exactly what the problem is: without synchronization, things can be executed in a counter-intuitive, non-sequential way. – assylias Sep 24 '12 at 09:27
But with the switch above, the code wouldn't work even work in a single thread. – HBZ Sep 25 '12 at 13:27
2

@user1671022 The JLS states: *"The actions of each thread in isolation must behave as governed by the semantics of that thread, with the exception that the values seen by each read are determined by the memory model."* => in other words, as soon as you start reading shared values, reoderings can happen that do not necessary follow the reordering rules for a single thread. Instead of talking of instructions reodering, you could say that when the same shared value is read twice, witohut synchronization, the second read can read an old value while the first read sees a more recent value. – assylias Sep 25 '12 at 14:42
@assylias - So there are some rules which define what can and what cannot be optimised out before instruction reordering is allowed to occur. Is there a reference to these rules anywhere? I would be interested to see how they define when a local variable can and cannot be optimised away. My point is that in some (admittedly weird) architectures it may be more efficient to use the instance variable instead of the local variable and a sufficiently canny compiler could make use of that in these cases to morph the "safe" code into "unsafe" behind the scenes. – OldCurmudgeon Mar 19 '13 at 13:53
@OldCurmudgeon It is in [17.4.1](http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4.1): *Local variables [...] are unaffected by the memory model.*, i.e. actions on non-shared variables need to be sequentially consistent. That does not mean that the actual execution is sequential, but if it is not, it should not be observable. For shared variables and in the absence of synchronization, each read can observe any prior write, as long as no memory barrier is crossed. – assylias Mar 19 '13 at 14:24
Hi, @assylias, I still confused about "(2) could read a 0", is it possible even in a single thread program ? why ? – WoooHaaaa May 29 '13 at 11:11
@assylias, You mean the reordering is performed in multi-threads ? You notice that *line #13 could be inserted before line #1* , the instructions looks like in a single thread's ? – WoooHaaaa May 29 '13 at 11:20
@MrROY I understand what you mean - my description is a bit misleading I suppose. The `hashcode_shared` could be rewritten by the compiler into: `int h = hash; if (hash != 0) return h; return (hash = 1);` for example. From a single threaded program perspective the two versions of the code are equivalent and will always return the same thing. However in a multi-threaded program, that new code could return 0. That is a very unlikely change but it is allowed by the JMM. – assylias May 29 '13 at 11:34
*"why the complier rewrite the code like that*" => Compilers and processors keep rewriting your code to optimise it. The example I give is unlikely to happen as the modified version does not seem better than the original one, but there are other situations where you will get similar changes. The important point is that it is a **legal** reordering from the JMM's perspective so in theory it could happen. – assylias May 29 '13 at 13:57
@assylias , Thanks so much, but the key is, why the complier rewrite the code like that ? Could you give the detail how the reordered translated to bytecode into your rewritten code ? Thanks again, hope this doesn't bother you. – WoooHaaaa May 29 '13 at 14:05
@assylias i just confused how the rewritten code comes out from your mind... I just can't understand how the bytecode translated into rewritten code. – WoooHaaaa May 29 '13 at 14:06
@MrROY I just created an example where the reads are reordered but is funciontally equivalent to the original code in a single thread. I did not created it from the bytecode. – assylias May 29 '13 at 14:08
@assylias Thanks, sounds make sense to me now. Have a nice day. – WoooHaaaa May 29 '13 at 14:15
My two cents, in the shared example, if r2 happens first, most processors are smart enough to read r1 from the load/store buffer instead of reading from the cache subsystem or main memory. If your code only runs on x86, this is safe. – Daniel Dec 22 '15 at 03:57

score 1 · Answer 2 · answered Sep 23 '12 at 18:29

I think the key thing to note is that in the thread that gets the wrong answer (returns 0), the body of the if statement is not executed - ignore it, could be anything.

The wrong read thread reads the non-volatile field twice but never writes it. So we are talking about just the ordering of two reads. The claim is that these are not ordered. In more complicated situations there might be aliasing, and it would be non-trivial for the compiler to check whether this was the same memory location or not. Taking the conservative route could prevent optimisations.

SeKa · Answer 3 · 2013-03-18T00:54:11.340

In layman's terms, I think this issue has to do with read (fetch) re-ordering.

Each thread, T1 and T2, want to get all their "inputs" to do processing (and without the strict volatile marking) they are given some freedom as to how/when to read their data.

The Bad Case:

Each thread needs to read the (instance) variable twice, once to check the if and once for the return value. Let's say for argument that T1 chooses to do the if read first and T2 chooses to do the return read first.

This creates the race condition that the hash variable is changed (by T1) between the update of hash by T1 and T2's second read (which T2 uses to check the if condition). So now T2's test is false, it does nothing and returns what it read (originally) for the instance variable, 0.

The Fixed Case:

Each thread only needs to read the (instance) variable once, and then immediately stores it in it's own local variable. This keeps the re-ordering read problem from occurring (as there is only one read).

OldCurmudgeon · Answer 4 · 2013-03-19T13:01:38.687

0

First the bad code:

int hash = 0;

public int hashCode() {
     if (hash == 0) {
         int off = offset;
         char val[] = value;
         int len = count;

         int h = 0;
         for (int i = 0; i < len; i++) {
             h = 31*h + val[off++];
         }
         hash = h;
     }
     return hash;
 }

Clearly we can reduce this to its bare bones as:

int hash = 0;

public int hashCode() {
     if (hash == 0) {
         // Assume calculateHash does not return 0 and does not modify hash.
         hash = calculateHash();
     }
     return hash;
 }

now the theory suggests that a reordering on one thread interlaced in a particular way with a second thread can result in a zero return. The only scenario I can imagine would be something like:

// Pseudocode for thread 1 starts with <1>, thread 2 with <2>. 
// Rn are local registers.
public int hashCode() {
     <2> has not begun
     <1> load r1 with hash (==0) in preparation for return for when hash is != 0
     <2> begins execution - finds hash == 0 and starts the calculation
     <2> modifies hash to contain new value.
     <1> Check hash for zero - it is not so skip the contents of the if
     if (hash == 0) {
         // Assume calculateHash does not return 0 and does not modify hash.
         hash = calculateHash();
         <2> got here but at a time when <1> was up there ^^
     }
     <1> return r1 - supposedly containing a zero.
     return hash;
 }

but then - to me - a similar treatment can be made with the good code:

public int hashCode() {
     int h = hash;
     if (h == 0) {
         hash = h = calculateHash();
     }
     return h;
 }

and then

public int hashCode() {
     <2> has not begun
     <1> load r1 with hash (==0) in preparation for return for when h is != 0
     <2> begins execution - finds h == 0 and starts the calculation
     <2> modifies hash to contain new value.
     <1> load r2 with hash - from now on r2 is h
     int h = hash;
     <1> Check r2 for zero - it is not so skip the contents of the if
     if (h == 0) {
         hash = h = calculateHash();
     }
     <1> we know that when hash/h are non-zero it doesn't matter where we get our return from - both r1 and r2 must have the same value.
     <1> return r1 - supposedly containing a zero.
     return h;
 }

I don't understand why one is a real possibility and the other is not.

edited Mar 19 '13 at 13:01

answered Mar 17 '13 at 23:05

OldCurmudgeon

64,482
16
119
213

where did you get *static* from? and the code can be validly x-formed to: `int h=hash; if (hash==0) {hash=h=calc();} return hash;` – bestsss Mar 17 '13 at 23:25
@bestsss - good point on the `static` - removed. I don't understand how your second point applies. – OldCurmudgeon Mar 17 '13 at 23:27
*The only scenario I can imagine would be something like:* <- there. The transformation above is what makes ' if (hash==0) hash=calc(); return hash()` improper to solve the problem. – bestsss Mar 17 '13 at 23:28
In short you assumptions are wrong. The transformation can break the code under any hardware model and it's still valid single-threaded. [Here is a ref](http://old.nabble.com/Reordering-and-Immutability-td34977748.html) to a recent JSR-166 mailing list question. – bestsss Mar 17 '13 at 23:29
`int h=hash; if (hash==0) {hash=h=calc();} return` **h** `;` - meh, made typo in the transform. – bestsss Mar 17 '13 at 23:38
@bestsss - From the original [blog](http://jeremymanson.blogspot.hk/2008/12/benign-data-races-in-java.html) shouldn't it be `...; if (h == 0) ...`? Yet I still don't see how that changes anything. Where does my second transform break the allowable transforms while my first does not? – OldCurmudgeon Mar 18 '13 at 00:20
*<1> return r1 - supposedly containing a zero.* this cant happen as it must return r2 due to: *<1> load r2 with hash - from now on r2 is h* <-- that's about the correct sample. – bestsss Mar 18 '13 at 00:27
2

The fixed code works b/c it prevents the JVM to issue 2 loads for hash. The original code doesn't prevent that. (again it doesn't mean there is a JIT that actually yields such a transform, but it doesn't mean it's forbidden either). – bestsss Mar 18 '13 at 00:30
@bestsss - Re: *<1> return r1...* but this is an optimising compiler and on my "special" cpu, working with r1 is always quickest so if, as in this case, we **know** that r1 and r2 contain the same value then r1 will be used, even though the intent was to use r2 at the start. – OldCurmudgeon Mar 18 '13 at 00:37
@bestsss - Re: *The fixed code works b/c ...* how is it prevented? Using a local variable is merely a suggestion to the compiler. If the compiler can work out a way that is quicker it is perfectly allowed, I believe, to completely ignore this 'hint'. – OldCurmudgeon Mar 18 '13 at 00:41
Not sure what is meant by "a local variable is merely a suggestion to the compiler"... A local variable is a local variable. It is space allocated on the stack for the function. It "works" because in the fixed code there is only **one** read of the (instance) hash. Then that local variable is guaranteed to be used in the if check and the return. – SeKa Mar 18 '13 at 00:47
*Using a local variable is merely a suggestion to the compiler.* <-- dead wrong! imagine that int `h=hash; assert h==h;` The assert MUST always succeed. – bestsss Mar 18 '13 at 00:49
For some clarification, here's an SO link regarding local variables: http://stackoverflow.com/questions/8061587/java-memory-stack-allocation-for-local-variables – SeKa Mar 18 '13 at 01:02
Ahhh!!! Forgive me - I have seen local variables optimised away in C and C+ compilers - I just assumed the same was allowed in Java - I begin to see the light. :) So `{ int a = 1; int b = 2; int c = 3; return a + b + c; }` cannot be optimised down to `{ return 6; }`. Wow! – OldCurmudgeon Mar 18 '13 at 01:12
In this case: they can - constantly folded. As it's provable the result is a constant 6 - always. There are no reaching definitions besides the constants. – bestsss Mar 18 '13 at 01:17
@OldCurmudgeon It can if the variables are final or effectively final. – assylias Mar 19 '13 at 12:32
@OldCurmudgeon I have added the bytecode to my answer to better illustrate what could happen. – assylias Mar 19 '13 at 13:34

Instructions reordering in Java JVM

4 Answers4

Linked