Is this understanding correct for these code about java volatile and reordering?

Question

According to this reorder rules

if I have code like this

volatile int a = 0;

boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

Make Thread A to run foo1 and Thread b to run foo2, since a= 10 is a volatile store and b = true is a normal store, then these two statements could possible be reordered, which means in Thread B may have b = true while a!=10? Is that correct?

Added:

Thanks for your answers!
I am just starting to learn about java multi-threading and have been troubled with keyword volatile a lot.

Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?

As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them. One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.

In some situation that looks like "invisible" happens examples:

    A=0,B=0; 
    thread1{A=1; B=2;}  
    thread2{if(B==2) {A may be 0 here}}

suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A. Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish. As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.

By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true. Am I right?

Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

Side note: you need to enable asserts for your process or otherwise `foo2()` would only contain an empty if-block (i.e. the assert would never be executed). — Thomas, Oct 14 '21 at 13:37

Peter Cordes · Answer 1 · 2021-10-14T21:41:13.717

I'm pretty sure the assert can fire. I think a volatile load is only an acquire operation (https://preshing.com/20120913/acquire-and-release-semantics/) wrt. non-volatile variables, so nothing is stopping load-load reordering.

Two volatile operations couldn't reorder with each other, but reordering with non-atomic operations is possible in one direction, and you picked the direction without guarantees.

(Caveat, I'm not a Java expert; it's possible but unlikely volatile has some semantics that require a more expensive implementation.)

More concrete reasoning is that if the assert can fire when translated into asm for some specific architecture, it must be allowed to fire by the Java memory model.

Java volatile is (AFAIK) equivalent to C++ std::atomic with the default memory_order_seq_cst. Thus foo2 can JIT-compile for ARM64 with a plain load for b and an LDAR acquire load for a.

ldar can't reorder with later loads/stores, but can with earlier. (Except for stlr release stores; ARM64 was specifically designed to make C++ std::atomic<> with memory_order_seq_cst / Java volatile efficient with ldar and stlr, not having to flush the store buffer immediately on seq_cst stores, only on seeing an LDAR, so that design gives the minimal amount of ordering necessary to still recover sequential consistency as specified by C++ (and I assume Java).)

On many other ISAs, sequential-consistency stores do need to wait for the store buffer to drain itself, so they are in practice ordered wrt. later non-atomic loads. And again on many ISAs, an acquire or SC load is done with a normal load preceded with a barrier which blocks loads from crossing it in either direction, otherwise they wouldn't work. That's why having the volatile load of a compile to an acquire-load instruction that just does an acquire operation is key to understanding how this can happen in practice.

(In x86 asm, all loads are acquire loads and all stores are release stores. Not sequential-release, though; x86's memory model is program order + store buffer with store-forwarding, which allows StoreLoad reordering, so Java volatile stores need special asm.

So the assert can't fire on x86, except via compile/JIT-time reordering of the assignments. This is a good example of one reason why testing lock-free code is hard: a failing test can prove there is a problem, but testing on some hardware/software combo can't prove correctness.)

Thanks for your answer, and I have added some question about volatile in my post — pythonHua, Oct 15 '21 at 13:44
@pythonHua: `volatile`, like C++ `std::atomic<>`, forces the JVM to load or store to memory, not keep something in a register. (Registers are thread-private; cache is coherent). For more about the fact that CPUs have coherent caches, see [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) (but note that's about C++ `volatile`, which forces load/store but does *not* give ordering wrt. other code.) Or for a Java version of that: [Myths Programmers Believe about CPU Caches](https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/) — Peter Cordes, Oct 15 '21 at 20:27
@pythonHua: That extra stuff in the question should really be posted as a separate question, if there aren't existing duplicates that explain exactly how Java `volatile` forces inter-thread visibility. You can add a link to your followup question in this one, but you can't just add a separate question about something else and expect answerers to edit their answers to answer that, too. The point of SO is to build up a database of useful answers to specific questions, not back-and-forth edits that result in big rambling posts about multiple topics. — Peter Cordes, Oct 15 '21 at 20:33
OK, I see it and it would be a better way to ask a separate question. The sentence in your answer "forces the JVM to load or store to memory", Does this means "read/write the value of volatile variable from/to main memory each time "? Or just means do not load or store with register and can still use the cpu cache as normal and only read/write from/to main memory if needed . — pythonHua, Oct 16 '21 at 14:24
[link](https://stackoverflow.com/questions/69596549/questions-about-visibility-of-java-volatile-field) Link for a separate question — pythonHua, Oct 16 '21 at 14:38

pveentjer · Answer 2 · 2021-10-14T19:02:11.940

In addition to Peter Cordes his great answer, in terms of the JMM there is a data race on b since there is no happens before edge between the write of b and the read of b because it is a plain variable. Only if this happens before edge would exist, then you are guaranteed that if load of b=1 that also the load of a=1 is seen.

Instead of making a volatile, you need to make b volatile.

int a=0;
volatile int b=0;

thread1(){
    a=1
    b=1
}

thread2(){
  if(b==1) assert a==1;
}

So if thread2 sees b=1, then this read is ordered before the write of b=1 in the happens before order (volatile variable rule). And since a=1 and b=1 are ordered happens before order (program order rule), and read of b and the read of a are ordered in the happens before order (program order rule again), then due to the transitive nature of the happens before relation, there is a happens before edge between the write of a=1 and the read of a; which needs to see the value 1.

You are referring to a possible implementation of the JMM using fences. And although it provides some insights into what happens under the hood, it is equally damaging to think in terms of fences because they are not a suitable mental model. See the following counter example:

https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#myth-barriers-are-sane

Thanks for your answer, and I have added some question about volatile in my post. — pythonHua, Oct 15 '21 at 13:39

score 1 · Answer 3 · answered Oct 14 '21 at 19:04

1

Yes, the assert can fail.

volatile int a = 0;

boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

The JMM guarantees that writes to volatile fields happen-before reads from them. In your example, whatever thread a did before a = 10 will happen-before whatever thread b does after reading a (while executing assert a == 10). Since b = true executes after a = 10 for thread a (for a single thread, happens-before is always holds), there is no guarantee that there'll be an ordering guarantee. However, consider this:

int a = 0;

volatile boolean b = false;

foo1(){ a= 10; b = true;}

foo2(){if(b) {assert a==10;}}

In this example, the situation is:

a = 10 ---> b = true---|
                       |
                       | (happens-before due to volatile's semantics)
                       |
                       |---> if(b) ---> assert a == 10

Since you have a total order, the assert is guaranteed to pass.

answered Oct 14 '21 at 19:04

Prashant Pandey

4,332
3
26
44

*writes to volatile fields happen-before reads from them* - It's the operations before/after writes/reads that are ordered wrt. each other, not the write to the volatile field itself that has any guaranteed ordering. Your phrasing would imply that every volatile write to that field in the whole program will always complete before any volatile read. But we know that's not how it works; a read can see the old value, in which case there's no synchronization connection created between stuff before the write in one thread with stuff after the read in the other thread. – Peter Cordes Oct 14 '21 at 21:36
(The rest of your answer is correct, it's just a phrasing problem with that sentence, where you didn't say what I'm sure you mean.) – Peter Cordes Oct 14 '21 at 21:36
Thanks for your answer, and I have added some question about volatile in my post – pythonHua Oct 15 '21 at 13:45
Hi Peter, thanks for reviewing my answer :). What I meant is that a write to a volatile variable creates a happens-before relationship to any subsequent reads from it. In the diagram, I have tried to illustrate that this creates a total ordering among instructions in the two threads. – Prashant Pandey Oct 23 '21 at 19:25

pveentjer · Accepted Answer · 2021-10-15T14:58:00.887

Answer to your addition.

Many tutorial talk about the visibility of volatile field, just like "volatile field becomes visible to all readers (other threads in particular) after a write operation completes on it". I have doubt about how could a completed write on field being invisible to other Threads(or CPUS)?

The compiler might mess up code.

e.g.

boolean stop;

void run(){
  while(!stop)println();
}

first optimization

void run(){
   boolean r1=stop;
   while(!r1)println();
}

second optimization

void run(){
   boolean r1=stop;
   if(!r1)return;
   while(true) println();
}

So now it is obvious this loop will never stop because effectively the new value to stop will never been seen. For store you can do something similar that could indefinitely postpone it.

As my understanding, a completed write means you have successfully written the filed back to cache, and according to the MESI, all others thread should have an Invalid cache line if this filed have been cached by them.

Correct. This is normally called 'globally visible' or 'globally performed'.

One exception ( Since I am not very familiar with the hardcore, this is just a conjecture )is that maybe the result will be written back to the register instead of cache and I do not know whether there is some protocol to keep consistency in this situation or the volatile make it not to write to register in java.

All modern processors are load/store architectures (even X86 after uops conversion) meaning that there are explicit load and store instructions that transfer data between registers and memory and regular instructions like add/sub can only work with registers. So a register needs to be used anyway. The key part is that the compiler should respect the loads/stores of the source code and limit optimizations.

suppose the compiler did not reorder it, what makes we see in thread2 is due to the store buffer, and I do not think a write operation in store buffer means a completed write. Since the store buffer and invalidate queue strategy, which make the write on variable A looks like invisible but in fact the write operation has not finished while thread2 read A.

On the X86 the order of the stores in the store buffer are consistent with program order and will commit to the cache in program order. But there are architectures where stores from the store buffer can commit to the cache out of order e.g. due to:

write coalescing
allowing stores to commit to cache as soon as the cache line is returned in the right state no matter if an earlier still is still waiting.
sharing the store buffer with a subset of the CPUs.

Store buffers can be a source of reordering; but also out of order and speculative execution can be a source.

Apart from the stores, reordering loads can also lead to observing stores out of order. On the X86 loads can't be reordered, but on the ARM it is allowed. And of course the JIT can mess things up as well.

Even we make field B volatile, while we set a write operation on field B to the store buffer with memory barriers, thread 2 can read the b value with 0 and finish.

It is important to realize that the JMM is based on sequential consistency; so even though it is a relaxed memory model (separation of plain loads and stores vs synchronization actions like volatile load/store lock/unlock) if a program has no data races, it will only produce sequential consistent executions. For sequential consistency the real time order doesn't need to be respected. So it is perfectly fine for a load/store to be skewed as long as:

there memory order is a total order over all loads/stores
the memory order is consistent with the program order
a load sees the most recent write before it in the memory order.

As for me, the volatile looks like is not about the visibility of the filed it declared, but more like an edge to make sure that all the writes happens before volatile field write in ThreadA is visible to all operations after volatile field read( volatile read happens after volatile field write in ThreadA has completed ) in another ThreadB.

You are on the right path.

Example.

int a=0
volatile int b=;

thread1(){
   1:a=1
   2:b=1
}

thread2(){
   3:r1=b
   4:r2=a
}

In this case there is a happens before edge between 1-2 (program order). If r1=1, then there is happens before edge between 2-3 (volatile variable) and a happens before edge between 3-4 (program order).

Because the happens before relation is transitive, there is a happens before edge between 1-4. So r2 must be 1.

volatile takes care of the following:

Visibility: needs to make sure the load/store doesn't get optimized out.
That is load/store is atomic. So a load/store should not be seen partially.
And most importantly, it needs to make sure that the order between 1-2 and 3-4 is preserved.

By the way, since I am not an native speakers, I have seen may tutorials with my mother language(also some English tutorials) say that volatile will instruct JVM threads to read the value of volatile variable from main memory and do not cache it locally, and I do not think that is true.

You are completely right. This is a very common misconception. Caches are the source of truth since they are always coherent. If every write needs to go to main memory, programs would become extremely slow. Memory is just a spill bucket for whatever doesn't fit in cache and can be completely incoherent with the cache. Plain/volatile loads/stores are stored in the cache. It is possible to bypass the cache for special situations like MMIO or when using e.g. SIMD instructions but it isn't relevant for these examples.

Anyway, Thanks for your answers, since not a native speakers, I hope I have made my expression clearly.

Most people here are not a native speaker (I'm certainly not). Your English is good enough and you show a lot of promise.

If an edit to the question introduces such a separate question that posting a separate answer seems like the best way to handle it, that's a sign that actually that edit to the question isn't a clarification but instead adding a 2nd separate question. So the actual way to handle this is to tell the querent to post a new SO question, and post answers there. If they're related, they can link to each other, but SO questions should still be about one thing. (Since you already posted here, you'll want to transplant your answer when the OP posts separately) — Peter Cordes, Oct 15 '21 at 21:11
Yes, I do know that JIT will do some strange optimization include the compiler reordering or something as you described above. But just as your examples, after the optimization, what happens is that the program did not read from the field stop. For me, that does not means that write on filed stop happens on ThreadB is invisible to ThreadA, but after JIT optimization, ThreadA will not read from filed stop which will make it look like "invisible". So the volatile here will tell the compiler that filed is important and is for multi-Thread so does not do this kind of optimization on it. — pythonHua, Oct 16 '21 at 14:52
I know a register needs to be used anyway. What I mean is that in some situation, write or read operation may just happen with the data in register. For example, a cpu read a old value from the register directly instead of loading a new value from cache to register and read the register again, and other threads' completed write operation only maintain cache concurrence and register is thread private, this is the situation that a write on filed is invisible between different cpus, so the volatile should will force it write/read value to/from cache instead of register(but stills use register ) — pythonHua, Oct 16 '21 at 15:01

Is this understanding correct for these code about java volatile and reordering?

4 Answers4

Linked