1

I modified one of JCStress examples:

@JCStressTest
@Outcome(id = "0, 0", expect = ACCEPTABLE, desc = "Doing both reads early.")
@Outcome(id = "1, 1", expect = ACCEPTABLE, desc = "Doing both reads late.")
@Outcome(id = "1, 0", expect = ACCEPTABLE, desc = "First is visible but not second.")
@Outcome(id = "0, 1", expect = ACCEPTABLE_INTERESTING, desc = "Second is visible but not first.")
@State
public class Reordering {
    int first;
    int second;

    @Actor
    public void actor1() {
        first = 1;
        second = 1;
    }

    @Actor
    public void actor2(II_Result r) {
        r.r2 = second;
        r.r1 = first;
    }
}

which gave me the following result:

RESULT SAMPLES FREQ EXPECT DESCRIPTION
0, 0 737,822,067 26.75% Acceptable Doing both reads early.
0, 1 1,838,578 0.07% Interesting Second is visible but not first.
1, 0 13,081,701 0.47% Acceptable First is visible but not second.
1, 1 2,005,604,406 72.71% Acceptable Doing both reads late.

The Acceptable results are easy to understand but I have some questions regarding the Interesting outcome. From what I understood, JVM can optimize the code and change the order of instructions meaning the first function could roughly be translated to:

public void actor1() {
    second = 1;
    first = 1;
}

which could explain why the Interesting result was achieved. My question is: is it possible that the Interesting result was achieved not due to the code reordering done by JVM but rather by "caching" the first and not making it visible to the thread running actor2 method since the field was not a volatile one? By caching I am talking about storing it in the CPU register/store buffer and making it not visible to the other thread.

Piotr Michalczyk
  • 249
  • 1
  • 13

1 Answers1

0

On the X86 stores can't be reordered with other stores and loads can't be reordered with other loads (only an older store and a newer load to a different address can be reordered). So if the compiler would not mess up the order, then on the X86 this can't fail. This is because of the strong memory model of X86: TSO (Total Store Order).

However, if you would go for a CPU with a weaker memory model e.g. ARM or RISC-V, then such reordering could happen.

pveentjer
  • 10,545
  • 3
  • 23
  • 40
  • I don't quite get this answer. I run the code on x64 processor and JCStress explicitly mentions reordering (I assume JIT reordering) as a root cause for this behavior. Do you suggest that in the question no reordering happened but rather the variable was cached? – Piotr Michalczyk Apr 05 '23 at 20:16
  • On the X86 the only way the above could happen is if the JIT would have reordered the instructions before they hit the CPU because the X86 will not reorder the 2 loads and will not reordered the 2 stores. – pveentjer Apr 06 '23 at 05:41
  • The example you have given is called the message-passing idiom and for that, you need to preserve causality. For causality you just need the 'second' store to be a release store and 'second' load to be an acquire load. On the X86 every store has release semantics (meaning that all loads/stores before that store will be ordered before that store) and every load has acquire semantics (meaning that all loads/stores after that load will be ordered after that load). So the above example can't fail as long as the JIT hasn't reordered the loads/stores. – pveentjer Apr 06 '23 at 05:45
  • JMH instructs the JIT to on purpose increase the amount of reordering of loads and stores for the sake of increasing the chance that you will see more possible behaviors. – pveentjer Apr 06 '23 at 05:47
  • I would be careful with referring to caching as the source of reordering. Modern CPUs always have coherent caches; that is the task of the cache coherence protocol like MESI. Memory is just a spill bucket for whatever doesn't fit into the cache. The cache is the source of truth. And also be very cautious of any blog post or article that talks about 'cache flushing'. This isn't how modern CPUs work. – pveentjer Apr 06 '23 at 05:52
  • When I refereed to caching I explicitly mentioned about values being cached in a CPU registers or store buffers and not CPU caches. So my question still stands, is it possible to get the outcome presented in the original question without any JIT reordering? – Piotr Michalczyk Apr 07 '23 at 11:48
  • No. This is because of the strong memory model of the X86: TSO. It is up to the microarchitecture how to implement these constraints. Guaranteeing store/store order is simple since slots in the store buffer are issued in program order and even though stores can execute out order, they retire and commit to the coherent cache in order. Guraranteeing Load/Load order is a bit trickier. Loads can execute out order; but the load buffer will detect if there is violation of the memory order and restarts the instruction pipeline and next time it is likely there is no memory order violation. – pveentjer Apr 07 '23 at 12:56
  • At the micro-architecture level, every read/write goes through a register (because at the microarchitecture level the X86 is a load/store architecture). So all loads/stores need to go through registers. And therefore a register will temporarily hold some store until it is written to some memory location (and the store is written to the store buffer). But it is the task of the micro-architecture to ensure that this doesn't become visible. In the previous example, I gave some hints on how this can be accomplished. – pveentjer Apr 07 '23 at 13:12
  • On the X86 the only cause of the JCStress example reordering is the JIT. If you want to look for something that can fail on the X86, then look for the store-buffering litmus test. – pveentjer Apr 07 '23 at 13:15
  • Here it is: https://github.com/openjdk/jcstress/blob/master/jcstress-samples/src/main/java/org/openjdk/jcstress/samples/jmm/basic/BasicJMM_07_Consensus.java PlainDekker. So even if the JIT would not reorder the loads/stores, then on the X86 this can still fail. – pveentjer Apr 07 '23 at 13:22
  • @PiotrMichalczyk has your question been answered? – pveentjer Apr 08 '23 at 09:14