I'll hope to provide some insights in how a lock and volatile variable fit into the happens-before relation.
Imagine the following program with plain loads/stores
int a=0
int b=0
CPU1:
a=1 (1)
b=1 (2)
CPU2:
while(b==0); (3)
print(a) (4)
When b==1
, is a
guaranteed to be 1? No, because there is no happens-before edge between the write and read of a
; they are in a data race. If you are lucky, a==1
is seen, but there is no guarantee.
Let's make b
volatile and see what happens:
int a=0
volatile int b=0
CPU1:
a=1 (1)
b=1 (2)
CPU2:
while(b==0); (3)
print(a) (4)
If a volatile read of b
sees 1 (3) then there is a happens-before edge between the write (2) and the read (3). This is because a volatile write synchronizes with its subsequent read, and hence they are ordered in the 'synchronizes-with' order and hence in the happens-before order.
We also have a happens-before edge between the (1) and (2) due to program order. And we have a happens-before edge between (3) and (4). Because happens before is transitive, we have a happens-before edge between (1) and (4). And hence there is no data race. So this means that when 'b==1' that 'a==1' as well.
Check the following explanation on how this translates to loads/stores on the CPU in combination with fences.
And now for the monitor:
int a=0
lock = new Object();
int b=0
CPU1:
a=1 (1)
lock(monitor) (2)
b=1 (3)
unlock(monitor); (4)
CPU2:
done=false
while(!done){
lock(monitor); (5)
if(b==1) done=true; (6)
unlock(monitor); (7)
}
print(a) (8)
When the unlock (4) synchronizes with a lock (5), then there is a happens-before edge. Synchronizes in this case means that they are ordered in the synchronizes-with order and hence they are part of the happens-before order.
And this means that there is a happens-before edge between (1) and (8) due to a sequence of program order rule edges and a single monitor lock rule edge between (4) and (5).
It is up to the JVM to ensure that the happens-before model is implemented correctly.
Some specific notes:
You will frequently hear that volatile causes a load or store to go through main memory. This is a fallacy; it isn't how modern CPUs work. Caches are always coherent. They typically make use of some MESI-based cache coherence protocol. So what typically happens is that when a write is made to a cache line, the cache line is first invalidated on all other CPUs. Once it has been invalidated, the CPU can apply the write to the cache line. CPUs typically use store-buffer to prevent the CPU from stalling waiting on that cache line. And this is also one of the causes why loads/stores can be reordered (so we get a violation of consistency).
The JMM is an axiomatic model and not an operational model. The focus of an axiomatic model is to tell which executions are allowed and which ones are forbidden. In the JMM this is done using the happens-before order (and the causal order). On purpose, it isn't expressed in terms of caches, store buffers, out-of-order execution, speculative execution, etc. It can be stress relieving to think in terms of fences, but it is a flawed model.
A lock can be implemented using volatiles only. E.g. Dekkers algorithm or Petersons algorithm. Not that anyone will do that on modern CPUs, but in theory, it is allowed. And when done using volatile, they will provide a happens-before edge. The point is, for the JMM it isn't relevant how a lock is implemented. Contended locks are typically implemented using some CAS operation. If threads need to be suspended or notified, interaction with the OS scheduler is needed as well.
Please have a look at the following orders:
- synchronization order
- synchronizes with order
- program order
- happens-before (order)
They are key to understanding the JMM.