If there is no mutual exclusion then how volatile ensures happens-before relationship?

Question

I read over here that volatile doesn't entail mutual exclusion locking, then how volatile ensures happens-before relationship or how volatile ensures that other thread reads updated value?

Please note that I have read this and this but it doesn't answer my question which is specifically about volatile and these answers in these questions really doesn't explain the point I am curious about.

What makes you think that mutual exclusion locking is needed for a happens-before relationship? — Erwin Bolwidt, Feb 06 '22 at 00:42
You need to study and understand the [Java Memory Model](https://docs.oracle.com/javase/specs/jls/se15/html/jls-17.html#jls-17.4) -- warning: mind-bending experience ahead. All `volatile` guarantees is that the code must fetch the value from memory instead of possibly using a previously cached copy. — Jim Garrison, Feb 06 '22 at 00:44
@ErwinBolwidt I am not arguing that volatile should use have mutual exclusion, I am trying to understand how it can ensure happens-before without mutual exclusion. — pjj, Feb 06 '22 at 00:46
@pjj you're saying "I read that volatile doesn't entail mutual exclusion locking, then how volatile ensures happens-before relationship " as if it is clear to you how mutual exclusion locking ensures a happens-before relationship, and you're surprised that volatile does too. Why do you think that mutual exclusion locking somehow clearly guarantees a happens-before relationship? — Erwin Bolwidt, Feb 06 '22 at 00:57
@JimGarrison volatile doesn't force fetching (or writing) a value to main memory. CPU caches are coherent; memory is just a spill bucket for whatever doesn't fit in cache. — pveentjer, Feb 06 '22 at 02:33

pveentjer · Answer 1 · 2022-02-06T03:57:44.327

I'll hope to provide some insights in how a lock and volatile variable fit into the happens-before relation.

Imagine the following program with plain loads/stores

int a=0
int b=0

CPU1:
    a=1 (1)
    b=1 (2)

CPU2:
    while(b==0); (3)
    print(a) (4)

When b==1, is a guaranteed to be 1? No, because there is no happens-before edge between the write and read of a; they are in a data race. If you are lucky, a==1 is seen, but there is no guarantee.

Let's make b volatile and see what happens:

int a=0
volatile int b=0

CPU1:
    a=1 (1)
    b=1 (2)

CPU2:
    while(b==0); (3)
    print(a) (4)

If a volatile read of b sees 1 (3) then there is a happens-before edge between the write (2) and the read (3). This is because a volatile write synchronizes with its subsequent read, and hence they are ordered in the 'synchronizes-with' order and hence in the happens-before order.

We also have a happens-before edge between the (1) and (2) due to program order. And we have a happens-before edge between (3) and (4). Because happens before is transitive, we have a happens-before edge between (1) and (4). And hence there is no data race. So this means that when 'b==1' that 'a==1' as well.

Check the following explanation on how this translates to loads/stores on the CPU in combination with fences.

And now for the monitor:

int a=0
lock = new Object();
int b=0

CPU1:
    a=1 (1)
    lock(monitor) (2)
    b=1 (3)
    unlock(monitor); (4)
CPU2:
    done=false
    while(!done){
       lock(monitor); (5)
       if(b==1) done=true; (6)
       unlock(monitor); (7)
    }
    print(a) (8)

When the unlock (4) synchronizes with a lock (5), then there is a happens-before edge. Synchronizes in this case means that they are ordered in the synchronizes-with order and hence they are part of the happens-before order.

And this means that there is a happens-before edge between (1) and (8) due to a sequence of program order rule edges and a single monitor lock rule edge between (4) and (5).

It is up to the JVM to ensure that the happens-before model is implemented correctly.

Some specific notes:

You will frequently hear that volatile causes a load or store to go through main memory. This is a fallacy; it isn't how modern CPUs work. Caches are always coherent. They typically make use of some MESI-based cache coherence protocol. So what typically happens is that when a write is made to a cache line, the cache line is first invalidated on all other CPUs. Once it has been invalidated, the CPU can apply the write to the cache line. CPUs typically use store-buffer to prevent the CPU from stalling waiting on that cache line. And this is also one of the causes why loads/stores can be reordered (so we get a violation of consistency).
The JMM is an axiomatic model and not an operational model. The focus of an axiomatic model is to tell which executions are allowed and which ones are forbidden. In the JMM this is done using the happens-before order (and the causal order). On purpose, it isn't expressed in terms of caches, store buffers, out-of-order execution, speculative execution, etc. It can be stress relieving to think in terms of fences, but it is a flawed model.
A lock can be implemented using volatiles only. E.g. Dekkers algorithm or Petersons algorithm. Not that anyone will do that on modern CPUs, but in theory, it is allowed. And when done using volatile, they will provide a happens-before edge. The point is, for the JMM it isn't relevant how a lock is implemented. Contended locks are typically implemented using some CAS operation. If threads need to be suspended or notified, interaction with the OS scheduler is needed as well.
Please have a look at the following orders:
- synchronization order
- synchronizes with order
- program order
- happens-before (order)
They are key to understanding the JMM.

Solomon Slow · Answer 2 · 2022-02-06T14:19:04.873

The magic happens in hardware: Most modern processor architectures have special memory barrier instructions that user-mode programs can use to ensure the consistency of memory operations as seen by different CPUs when it is important.

Enforcing memory consistency is expensive. If the caches had to always contain consistent data, programs would run much more slowly. The purpose of the memory barrier instructions is to mark the parts of the program where consistency really matters, and outside of those code regions, the CPUs are free to cache data independently of each other.

Reading and writing any volatile variable causes your program to execute memory barrier instructions that force the hardware to obey the "happens before" requirements (the memory consistency model) laid out by the Java Language Spec. And, locking and unlocking any mutex also causes your program to execute the same or similar instructions.

Memory fences are for consistency; not coherence. Coherence is ordering reads/writes of a single address; consistency is ordering reads/writes between different addresses. — pveentjer, Feb 06 '22 at 03:26
@pveentjer, Thank you for pointing out the difference. I have updated my answer. — Solomon Slow, Feb 06 '22 at 14:20

score -2 · Answer 3 · answered Feb 06 '22 at 00:44

-2

Volatile ensure happens-before relationship by two mechanisms:

If a variable is marked as volatile, then JVM implementation has to ensure that its value is never cache in L1/L2/L3 caches of the processor. Value of a volatile variable must be immediately flushed to main memory. It is mentioned in JVM specification section 4.5.
Updates to volatile variable happens using compare-and-swap atomic instruction.

answered Feb 06 '22 at 00:44

hagrawal7777

14,103
5
40
70

@JimGarrison your point is not only irrelevant but also wrong, volatile long and volatile double ensures any write to long and double happens in a single write. Kindly read - https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.7 – hagrawal7777 Feb 06 '22 at 00:52
You are correct, I misread that paragraph. Comment deleted. – Jim Garrison Feb 06 '22 at 00:56
1

A happens-before relationship ensures that *everything* done by one thread becomes visible to another thread, including writes to non-volatile variables. JVM specification section 4.5 is about fields in the class file format, not about volatile. And it's not true that the JVM implementation has to ensure that volatile values are not cached. Whether or not that is needed is totally architecture dependent and modern architectures have cache coherence protocols. – Erwin Bolwidt Feb 06 '22 at 01:04
@ErwinBolwidt It talks about how access flag associated with each field should be interpreted. In my understanding, it clearly states the interpretation of "ACC_VOLATILE" access flag (which is applicable to volatile variables) that if this access flag is associated with a field then it cannot be cached. – hagrawal7777 Feb 06 '22 at 01:10
1

Both of your points are invalid. 1) CPU caches are always coherent (no matter if the variable is volatile or not). This is taken care of by the cache coherence algorithm. See https://shipilev.net/blog/2016/close-encounters-of-jmm-kind/#myth-reorderings-and-commit. 2) Volatile do not make use of CAS. Please check https://shipilev.net/blog/2014/on-the-fence-with-dependencies/ for more detail. – pveentjer Feb 06 '22 at 03:23
@pveentjer: you are quoting some random blog.. what can I say.. hv you ever heard about "memory inconsistency issues", which is directly linked with values getting flushed to main memory or not.. I would recommend you read https://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html and section 17 of JLS.. show me some official docs which proves your point and I will consider them.. – hagrawal7777 Feb 06 '22 at 04:37
For an understanding on how modern CPUs work I would suggest: https://www.morganclaypool.com/doi/10.2200/S00962ED2V01Y201910CAC049 which you can download for free. I would suggest you read the link you posted yourself since it doesn't talk about flushing to memory and using CAS for volatiles. For the official JMM literature, I would suggest https://download.oracle.com/otndocs/jcp/memory_model-1.0-pfd-spec-oth-JSpec/ and start with chapter 4 to jump into the formal semantics of the JMM. – pveentjer Feb 06 '22 at 04:49
And it is nice to quote section 17 of the JLS as a defense, but this chapter on purpose doesn't talk about writing to main memory. First of all because it is not true. But the JMM is an axiomatic model and on purpose doesn't talk about hardware implementation issues like cache coherence. – pveentjer Feb 06 '22 at 04:53
And you could write some code using JMH and use the perfasm profiler and check the generated machine code. This way you can see it with your own eyes. – pveentjer Feb 06 '22 at 05:01
Just to set the context, Shipilev is not random – wildnez Feb 06 '22 at 05:01
@pveentjer: Fine. But my only problem is that in presence of your argument I don't how "memory inconsistency issues" remain valid. – hagrawal7777 Feb 12 '22 at 19:37
Caches are typically coherent. So there is no reordering of loads and stores for a single address. E.g. you won't have the problem that you first see a new value and then go back and see an old value. But modern CPUs are typically inconsistent; so it could be that there is a reordering of loads and stores between different addresses. The simplest example would be store buffers on a CPU whereby an older store gets reordered with a newer load to a different address. Apart from the CPU; the compiler can also cause inconsistency and incoherence problems. – pveentjer Feb 13 '22 at 02:47
There are many other hardware causes of inconsistency: speculative execution, out-of-order execution, write coalescing, sharing the store buffer with a subset of the cores, etc. – pveentjer Feb 13 '22 at 02:51

If there is no mutual exclusion then how volatile ensures happens-before relationship?

3 Answers3