In the above examples, we write to a volatile variable, which causes an mfence, which in turn flushes all pending store buffers/load buffers to main cache...
This is correct.
invalidating other cache lines.
This is not correct or at least is misleading. It is not the write memory-barrier which invalidates the other cache lines. It is the read memory-barrier running in the other processors which invalidates each processor's cache lines. Memory synchronization is cooperative action between the thread writing and the other threads reading from volatile
variables.
The Java memory model actually guarantees that only a read of the same variable that was written to will the variable be guaranteed to be updated. The reality is that all memory cache lines are flushed upon a write memory barrier being crossed and all memory cache lines are invalidated when a read memory barrier is crossed – regardless of the variable being accessed.
However, the non-volatile fields could be optimized and be stored in registers for example? So how can we be sure that given a write to volatile variable ALL state changes prior to it will be visible? What if we change 1000 things?
According to this documentation (and others), memory barriers also cause the compiler to generate code that flushes registers as well. To quote:
... while with barrier() the compiler must discard the value of all memory locations that it has currently cached in any machine registers.