How to demonstrate Java instruction reordering problems?

Question

With Java instruction reordering the execution order of the code is changed by the JVM at compile time or run time, possibly causing unrelated statements to be executed out-of-order.

Edit: [Instruction reordering can produce counter-intuitive results. Many CPU architectures can reorder the memory interactions of machine instructions which leads to similar unexpected results even if the compiler did not change the instruction order. Thus, the term memory reordering may be a better fit than instruction reordering.]

So my question is:

Can someone provide an example Java program/snippet, that reliably shows an instruction reordering problem, that is not caused also by other synchronization issues ( such as caching/visibility or non-atomic r/w, as in my failed attempt at such a demo in my previous question )

To emphasize, I am not looking for examples of theoretical reordering issues. What I am looking for is a way to actually demonstrate them by seeing incorrect or unexpected results of a running program.

Barring a faulty behavior example, just showing actual reordering happening in the assembly of a simple program could also be nice.

But that would be a bug, and probably fixed in the next release when discovered. Are you looking for historical issue reports, or open tickets or such? Or do you mean to say that this is a situation that can just happen and is not going to be resolved? — Thilo, Oct 04 '18 at 14:12
It wouldn't be a bug. Instruction reordering can be visible from other threads if proper synchronization/memory barriers aren't in place. — John Kugelman, Oct 04 '18 at 14:13
Well, OP explicitly excluded undefined behaviour due to insufficient thread synchronization. Modulo that, it's a bug, no? — Thilo, Oct 04 '18 at 14:14
And you are saying you can get undefined behaviour due to re-ordering even when thread synchronization has been applied properly? Do you have a reference for that? — Thilo, Oct 04 '18 at 14:17
that would be tremendously hard to show IMO on `x86`, but a very nice question — Eugene, Oct 04 '18 at 14:22
@Thilo I take it that OP wants an example where improper synchronization triggers a problem caused specifically by instruction reordering, not one caused by non-atomicity of reads or some other synchronization issue. There are many specific reasons improper synchronization can be a problem; they're interested in this particular one. — John Kugelman, Oct 04 '18 at 14:22
Do you asking for an example which shows [this](https://docs.oracle.com/javase/specs/jls/se11/html/jls-17.html#d5e35122)? — LuCio, Oct 04 '18 at 14:27
A good answer might show bytecode with reordered instructions that is buggy, yet would in fact work if the instructions were in their original order. A more difficult demonstration could inspect the JIT's native code output and point out a reordering optimization it performs that causes failure. — John Kugelman, Oct 04 '18 at 14:28
How do you expect to see instruction reordering happening in the bytecode? It's something done by the JIT compiler. javac does not perform any instruction reordering. — yole, Oct 10 '18 at 17:38
I found an example but for C++. Maybe you can translate it to Java: https://preshing.com/20120515/memory-reordering-caught-in-the-act/ — Martin, Oct 10 '18 at 17:55
Possible duplicate - https://stackoverflow.com/questions/35883354/is-there-any-instruction-reordering-done-by-the-hotspot-jit-compiler-that-can-be — bobah, Oct 12 '18 at 09:00
It sounds like you're actually worried about memory reordering. On non-x86 ISAs, memory ordering != asm program order, so instruction order is not really relevant to correctness without also having memory barriers or e.g. ARM64 `stlr` / `ldar` release-store / acquire-load instructions. And even on x86, StoreLoad reordering happens, so you don't get sequentially-consistent execution. Anyway, if lack of instruction reordering was all that was providing correctness on x86 without having told the JVM about the desired memory ordering, your code is broken on ARM. — Peter Cordes, Oct 13 '21 at 23:00
Even an in-order pipeline can create LoadStore and LoadLoad reordering with scoreboarding of loads and hit-under-miss cache: [How is load->store reordering possible with in-order commit?](https://stackoverflow.com/q/52215031). And basically all CPUs have StoreLoad reordering, although that's not a problem for acq/rel sync. (This is memory reordering relative to asm instruction program order and/or execution order, not compile-time reordering of source vs. asm, but my point is that compile-time ordering is just one of many things that are necessary for correct / desired memory ordering) — Peter Cordes, Oct 13 '21 at 23:02
@PeterCordes Much of what you wrote is a bit too technical for me. Please take a look at my edit, and see if I've captured the gist of what you are saying correctly. — Gonen I, Oct 14 '21 at 08:21
Yeah, that's basically the point. In some ways that significantly changes the question, but if you want a memory-reordering demo that works even on x86, compile-time reordering (including JIT) is only way to break most uses of lock-free inter-thread communication (because usually acquire/release synchronization is all that's needed, and x86 does that for free, just based on asm program order). [C++ How is release-and-acquire achieved on x86 only using MOV?](https://stackoverflow.com/q/60314179) — Peter Cordes, Oct 14 '21 at 08:49
The other major way to see memory reordering on x86 is with something that would demo StoreLoad reordering, like https://preshing.com/20120515/memory-reordering-caught-in-the-act/, which x86's asm memory model allows (i.e. at run-time for a fixed asm instruction order). In that case asm instruction order is *not* sufficient. (Most other ISAs allow that *and more*: https://preshing.com/20120930/weak-vs-strong-memory-models/). Your previous wording ruled that out, and gave the impression that asm itself would be sequentially consistent, making compile-time reodering sound like the only problem. — Peter Cordes, Oct 14 '21 at 08:56
Note that store forwarding means a thread always sees *it's own* stores in program order, i.e. CPUs (and compilers) must not break single-threaded code. So StoreLoad reordering isn't the *only* effect of the store-buffer + store forwarding that's part of CPU memory models, including x86's. Java may not let you type-pun like this, but if you do for example an `int` load that overlaps with a recent `char` store, you can potentially load a value that no other core can see. (Or will ever see if another thread's int or char store also overlaps that and becomes globally visible first.) — Peter Cordes, Oct 14 '21 at 09:04
[Globally Invisible load instructions](https://stackoverflow.com/a/50617986). So it's not as simple as just "x86 allows StoreLoad reordering". Not that all this x86 detail is really relevant, mostly just bringing this up because you accepted an x86-specific answer. That answer would show reordering on most ARM or AArch64 CPU even if the asm loads/stores matched Java source order. But that's fine, running on x86 and seeing acq/rel sync broken (and checking the asm) does demo what you asked for: *instruction* reordering by whichever JIT was used. — Peter Cordes, Oct 14 '21 at 09:07
TL:DR: as a "how does it break" question, this is fine. But just be aware that x86 is pretty strongly ordered, and can make things happen to work (depending on lack of compile-time reordering) when they're not safe in the Java memory model, and *will* break on other ISAs. So understanding when instruction reordering happens or not isn't relevant for reasoning about correctness of portable programs, only trying to hack cheap x86-only equivalents for C++ `memory_order_release` / `memory_order_acquire` in Java which only provides expensive `seq_cst` via `volatile`. (Unless that's changed) — Peter Cordes, Oct 14 '21 at 09:09

score 13 · Accepted Answer · edited Jul 17 '20 at 01:39

This demonstrates reordering of certain assignments, out of 1M iterations there is usually couple of printed lines.

public class App {

    public static void main(String[] args) {

        for (int i = 0; i < 1000_000; i++) {
            final State state = new State();

            // a = 0, b = 0, c = 0

            // Write values
            new Thread(() -> {
                state.a = 1;
                // a = 1, b = 0, c = 0
                state.b = 1;
                // a = 1, b = 1, c = 0
                state.c = state.a + 1;
                // a = 1, b = 1, c = 2
            }).start();

            // Read values - this should never happen, right?
            new Thread(() -> {
                // copy in reverse order so if we see some invalid state we know this is caused by reordering and not by a race condition in reads/writes
                // we don't know if the reordered statements are the writes or reads (we will se it is writes later)
                int tmpC = state.c;
                int tmpB = state.b;
                int tmpA = state.a;

                if (tmpB == 1 && tmpA == 0) {
                    System.out.println("Hey wtf!! b == 1 && a == 0");
                }
                if (tmpC == 2 && tmpB == 0) {
                    System.out.println("Hey wtf!! c == 2 && b == 0");
                }
                if (tmpC == 2 && tmpA == 0) {
                    System.out.println("Hey wtf!! c == 2 && a == 0");
                }
            }).start();

        }
        System.out.println("done");
    }

    static class State {
        int a = 0;
        int b = 0;
        int c = 0;
    }

}

Printing the assembly for the write lambda gets this output (among other..)

                                                ; {metadata('com/example/App$$Lambda$1')}
  0x00007f73b51a0100: 752b                jne       7f73b51a012dh
                                                ;*invokeinterface run
                                                ; - java.lang.Thread::run@11 (line 748)

  0x00007f73b51a0102: 458b530c            mov       r10d,dword ptr [r11+0ch]
                                                ;*getfield arg$1
                                                ; - com.example.App$$Lambda$1/1831932724::run@1
                                                ; - java.lang.Thread::run@-1 (line 747)

  0x00007f73b51a0106: 43c744d41402000000  mov       dword ptr [r12+r10*8+14h],2h
                                                ;*putfield c
                                                ; - com.example.App::lambda$main$0@17 (line 18)
                                                ; - com.example.App$$Lambda$1/1831932724::run@4
                                                ; - java.lang.Thread::run@-1 (line 747)
                                                ; implicit exception: dispatches to 0x00007f73b51a01b5
  0x00007f73b51a010f: 43c744d40c01000000  mov       dword ptr [r12+r10*8+0ch],1h
                                                ;*putfield a
                                                ; - com.example.App::lambda$main$0@2 (line 14)
                                                ; - com.example.App$$Lambda$1/1831932724::run@4
                                                ; - java.lang.Thread::run@-1 (line 747)

  0x00007f73b51a0118: 43c744d41001000000  mov       dword ptr [r12+r10*8+10h],1h
                                                ;*synchronization entry
                                                ; - java.lang.Thread::run@-1 (line 747)

  0x00007f73b51a0121: 4883c420            add       rsp,20h
  0x00007f73b51a0125: 5d                  pop       rbp
  0x00007f73b51a0126: 8505d41eb016        test      dword ptr [7f73cbca2000h],eax
                                                ;   {poll_return}
  0x00007f73b51a012c: c3                  ret
  0x00007f73b51a012d: 4181f885f900f8      cmp       r8d,0f800f985h

I am not sure why the last mov dword ptr [r12+r10*8+10h],1h is not marked with putfield b and line 16, but you can see the swapped assignment of b and c (c right after a).

EDIT: Because writes happen in order a,b,c and reads happen in reverse order c,b,a you should never see an invalid state unless the writes (or reads) are reordered.

Writes performed by single cpu (or core) are visible in same order by all processors, see e.g. this answer, which points to Intel System Programming Guide Volume 3 section 8.2.2.

Writes by a single processor are observed in the same order by all processors.

I don't mean to be difficult, since it really seems from the assembly that the writes are reordered, but how do you rule out, for example, that c == 2 and b == 0 is not caused by b not being updated from thread cache to main memory? — Gonen I, Oct 11 '18 at 06:36
`State` properties are neither `volatile` nor `Atomic`. And there is no happens-before relation between writes in one thread and reads in another. So the fact that sometimes results is different could be cause by improper synchronization, no? — Ivan, Oct 11 '18 at 14:38
@Ivan because there is no happens-before relation between reads and writes the jit compiler is allowed to reorder the instructions. You would never get the weird result without instruction reordering and it is not a case of normal race condition as with OP's attempt in linked question. — František Hartman, Oct 11 '18 at 14:49
@GonenI: x86, like all other ISAs where we run threads across multiple cores, guarantees coherent data caches. Other cores don't read directly from main memory, they read through caches which are part of the same coherency domain as the core running the code that stored. You're mixing up compilers choosing to keep values in private *registers* vs. actual stores committing to shared cache. — Peter Cordes, Oct 13 '21 at 23:10
(x86 guarantees stores and loads become visible to other cores in asm program order, with the stores delayed by a store buffer w/ store-forwarding. Actual CPUs maintain that illusion even with OoO exec. [Other ISAs don't](https://preshing.com/20120930/weak-vs-strong-memory-models/); only on x86 can you get acquire/release synchronization just by ordering normal load and store instructions, without using special loads/stores or barriers. **If there hadn't been instruction reordering here, this demo would only happen to work on x86, not ARM, PowerPC or other ISAs**.) — Peter Cordes, Oct 13 '21 at 23:15
@PeterCordes You've written quite a lot in the comments here, and I thought it might be more appropriate for a new question so I posted one. If you think it is not, feel free to close my new question. — Gonen I, Oct 14 '21 at 10:18
`tmpB == 1 && tmpA == 0` doesn't necessarily happen due to reordering. It could be possible for the write thread to only flush the `b` variable value to the main memory while not doing the same for the `a` variable value. The same applies for other checks. — Piotr Michalczyk, Oct 28 '22 at 13:07
@PiotrMichalczyk no, this is not possible on Intel architecture. All writes are observed in order. — František Hartman, Oct 29 '22 at 11:49
@FrantišekHartman right but the question doesn't ask about the specific architecture nor the Java Memory Model is coupled only to the Intel architecture — Piotr Michalczyk, Oct 31 '22 at 10:02

score 6 · Answer 2 · edited Jun 20 '20 at 09:12

6

Test

I wrote a JUnit 5 test that checks whether instruction reordering took place after two threads terminate.

The test must pass if no instruction reordering happened.
The test must fail if instruction reordering occurred.

‌

public class InstructionReorderingTest {

    static int x, y, a, b;

    @org.junit.jupiter.api.BeforeEach
    public void init() {
        x = y = a = b = 0;
    }

    @org.junit.jupiter.api.Test
    public void test() throws InterruptedException {
        Thread threadA = new Thread(() -> {
            a = 1;
            x = b;
        });
        Thread threadB = new Thread(() -> {
            b = 1;
            y = a;
        });

        threadA.start();
        threadB.start();

        threadA.join();
        threadB.join();

        org.junit.jupiter.api.Assertions.assertFalse(x == 0 && y == 0);
    }

}

Results

I ran the test until it fails several times. The results are as follows:

InstructionReorderingTest.test [*] (12s 222ms): 29144 total, 1 failed, 29143 passed.
InstructionReorderingTest.test [*] (26s 678ms): 69513 total, 1 failed, 69512 passed.
InstructionReorderingTest.test [*] (12s 161ms): 27878 total, 1 failed, 27877 passed.

Explanation

The results we expect are

x = 0, y = 1: threadA runs to completion before threadB starts.
x = 1, y = 0: threadB runs to completion before threadA starts.
x = 1, y = 1: their instructions are interleaved.

No one can expect x = 0, y = 0, which may happen as the test results showed.

The actions in each thread have no dataflow dependence on each other, and accordingly can be executed out of order. (Even if they are executed in order, the timing by which caches are flushed to main memory can make it appear, from the perspective of threadB, that the assignments in threadA occurred in the opposite order.)

_{Java Concurrency in Practice, Brian Goetz}

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 12 '18 at 08:44

Andrew Tobilko

48,120
14
91
142

I think the assert inspires more confidence this way: Assert.assertFalse(x == 0 && y == 0 && a == 1 && b == 1); // Just to make sure some other weirdness is not happening – Gonen I Oct 12 '18 at 19:23
1

Thanks for your answer. I modified it to use a loop and dropped JUnit, and it eventually it fails on my machine as well ( though at times it passes 10 mil iteratrions ). I did wonder how to rule out an unflushed cache as the reason, but this http://gee.cs.oswego.edu/dl/cpj/jmm.html states "if one thread synchronizes on the termination of another thread using Thread.join, then it is guaranteed to see the effects made by that thread" – Gonen I Oct 12 '18 at 19:48
Althogh it looks great, it is an example of concurrency and threading, NOT instruction reordering... Those are completely different things. Reorder takes place in a single code block and therefore in a single thread not across multiple ones. – Hash Oct 13 '18 at 06:03
@Mark if you look at the figure, you might notice the reordering of `a = 1` and `x = b` in `threadA` which is "a single code block" – Andrew Tobilko Oct 13 '18 at 08:03
@Mark and we need a multithreaded context to have different perspectives on the code where reordering may occur potentially – Andrew Tobilko Oct 13 '18 at 08:08
@Andrew hm right.. I was not getting your point as the x=1 and y=1 possbility is missing from the explanation part... Have my upvote, but I do not know the necessity of the multithreadning part. In case of the usage of the Unsafe class it might not has to have a multithread environment. – Hash Oct 13 '18 at 08:17
@Andrew correct me if I am wrong but this is a bug. If the reordering must not have any consequences in the output of the running program, this reproduction shows a problem in the specific jvm you are running your code on. – Hash Oct 13 '18 at 08:37
@Mark this reproduction show what might happen due to improper synchronisation. – Andrew Tobilko Oct 14 '18 at 07:39

score -1 · Answer 3 · answered Oct 13 '18 at 15:23

-1

For single thread executions reordering is not a problem at all, because of Java Memory Model (JMM) (guarantee that any reads actions related to writes are total ordered) and cannot lead to unexpected results.

For concurrent execution, rules are completely different and things gets more complicated to understand (even by providing simple example which will raise even more questions). But even this are totally described by JMM with all corner cases, so, unexpected results also forbidden. Generally, forbidden if all barriers are placed right.

For better understanding reordering I strongly recommend this subject with plenty examples insides.

answered Oct 13 '18 at 15:23

user3904219

553
5
6

I think you mean instruction reordering isn't a problem if your code isn't already broken. Yeah, I think that's true; anything inter-thread memory ordering you get from lack of instruction reordering can only help you on a strongly-ordered asm memory model like x86, so such a program that happened to work on x86 (with one JVM JIT version / options) wouldn't work on ARM, MIPS, PowerPC, etc., if you didn't tell the compiler about the necessary memory ordering semantics so it could use special instructions. https://preshing.com/20120930/weak-vs-strong-memory-models/ – Peter Cordes Oct 13 '21 at 23:22

How to demonstrate Java instruction reordering problems?

3 Answers3

Test

Results

Explanation

Linked