How can a store to a variable be reordered after a load of that variable, given single-threaded serialization guarantee?

Question

Lets have a simple code snippet

   import java.invoke.VarHandle.fullFence;
   ...
   int x = 42;
   int y = 42;
   ...
   x = 1;
   fullFence();
   x = 0;
   y = x;
   fullFence();

   //another thread
   if (y == 1)
      System.err.println("wtf?");

There is another thread which reads x and y - I am trying to reason what kind of guarantees it gets from this code, before adding fences itself.

Considering that the thread executing that snippet must see y==0 after the second fence, can load(x) and store(x, 0) from between the fences be actually reordered with each other? If so, then why?

It has already been answered here: https://stackoverflow.com/a/69527530 — user18228068, Feb 16 '22 at 22:37
How are x and y declared? And what is "fullFence"? The Java memory model is not defined in terms of fences. A single thread cannot observe out of order changes when these changes are made by the thread itself. — Erwin Bolwidt, Feb 17 '22 at 00:06
Memory reordering is separate from execution reordering. Other cores only see stores when they commit from the store buffer to coherent L1d cache, but the local core sees its own stores in program order via store-forwarding. See https://preshing.com/20120515/memory-reordering-caught-in-the-act/ re: StoreLoad reordering when observed by another thread. See also [Reason for the name of the "store buffer" litmus test on x86 TSO memory model](https://stackoverflow.com/q/69112020) for details for x86. — Peter Cordes, Feb 17 '22 at 00:08
Possible duplicate of [Java instruction reordering and CPU memory reordering](https://stackoverflow.com/q/69568946)? Not of the answer @user18228068 linked, though, I don't think: [Java Memory Model - Surprising Behaviors](https://stackoverflow.com/a/69527530) is about something like LoadStore reordering, except on the *same* variable, not between two separate memory locations. That possibility is a lot harder to account for. — Peter Cordes, Feb 17 '22 at 00:14
Does this answer your question? [Java instruction reordering and CPU memory reordering](https://stackoverflow.com/questions/69568946/java-instruction-reordering-and-cpu-memory-reordering) — Peter Cordes, Feb 17 '22 at 00:15
Also [How does memory reordering help processors and compilers?](https://stackoverflow.com/q/37725497) — Peter Cordes, Feb 17 '22 at 00:15
Before jumping into manual fences, I would make a JMH benchmark and check if a standard solution using Atomics and volatile doesn't suffice. Dealing with fences can be very tricky and it is very easy to end up with bugs that don't happen often or only on specific hardware. And even if you get it right, chances are that other engineers eventually break it. — pveentjer, Feb 18 '22 at 04:56
As long as you are only looking at a single threaded execution, this is a pointless question. Of course, there is only one possible outcome when there is no concurrent access. You can even remove the fences then, the result is clear. If you want to discuss what may happen in concurrent scenarios, you have to include what the other thread(s) do. — Holger, Feb 25 '22 at 14:05
@Holger, well you answered yourself. No, I am not looking at single-threaded execution. There is concurrent access. I clarified I am looking into what another thread can expect before introducing synchronisation itself. — Turin, Feb 28 '22 at 17:32
@pveentjer. Your comment is as true as it is irrelevant to the question asked. — Turin, Feb 28 '22 at 17:33
As long as no other thread is ever writing to `x` or `y`, the fact that the single writing thread never writes `1` to `y` will never change. You could remove all the fences and still get the same result, especially as the other threads have no fences, so the fences have no effect anyway. The other threads can see all combinations of `0`, `1`, or `42` for `x` and `0` or `42` for `y`, but no `1` for `y` because there is no such write. The JMM forbids “out-of-thin-air values”. — Holger, Feb 28 '22 at 17:58
You more or less said it yourself - a cardinal rule of concurrency is that every thread observes its own loads and stores as if in program order. Thus the value stored to `y` will be `0`, no ifs ands or buts, and no fences needed. (Unless of course some other thread is writing to `x` in between - but then you have a data race and all bets are off.) — Nate Eldredge, Mar 01 '22 at 07:42

score 2 · Accepted Answer · answered Mar 01 '22 at 08:36

As I see it, this question is not really about reordering. As you said yourself, a cardinal rule of concurrency is that every thread observes its own loads and stores as if in program order. Thread 1 absolutely sees the load of x happen after the store of 0 to x, and so the load will yield the value 0 (assuming no other thread is storing to x concurrently). Thus the value stored to y will be 0, no ifs ands or buts, and no fences needed. The second thread cannot ever print wtf?.

Reordering would be the question of whether the load of x could become visible to another thread before the store of 0. This is entirely possible. You could observe it by having Thread 2 do:

if (y == 0) {
    x = 17;
    System.out.println(x);
}

You might think that if we load 0 from y, then the store to y already happened, which means the load from x already happened, which means the store of 0 to x already happened. And so you would think that Thread 2 must print 17, because all other stores to x happened earlier. But in fact it could print 0.

Note that to get this behavior, you must not synchronize Thread 2 to wait until after the second fence. If you do, then it can only print 17, since after the fence, all of Thread 1's loads and stores have finished and become visible. But if you don't, it's a data race, and that allows pretty free reordering, essentially C++'s memory_order_relaxed if I am not mistaken.

A simple hardware mechanism that could cause this is a store buffer. When Thread 1 stores 0 to x, perhaps the store does not go to the coherent cache right away, but is put in a store buffer while Core 1 waits to take ownership of that cache line. When it then loads from x on the next line, it fulfills the load from its store buffer and sees the value 0. Meanwhile, Thread 2 stores 17 to x, and its store hits the cache. Later, Core 1 finally gets ownership of the cache line and empties the store buffer, storing 0 to x in cache. After that, Core 2 loads x again, and now gets the new value of 0 from cache.

The compiler could also reorder the code to get the same effect. It would be free to rewrite the code as:

   x = 1;
   fullFence();
   y = 0;
   x = 0;
   fullFence();

This is okay, because the only way for this code to behave differently from the original would be if some other thread were concurrently writing to x. And that would be a data race, so the compiler does not have to care what the other thread sees in such a case.

How can a store to a variable be reordered after a load of that variable, given single-threaded serialization guarantee?

1 Answers1