How is a StoreStore barrier mapped to instructions under x86?

Question

StoreStore Barriers The sequence: Store1; StoreStore; Store2 ensures that Store1's data are visible to other processors (i.e., flushed to memory) before the data associated with Store2 and all subsequent store instructions. In general, StoreStore barriers are needed on processors that do not otherwise guarantee strict ordering of flushes from write buffers and/or caches to other processors or main memory.

And, from the cookbook we know that, for a synchronized block, the Java compiler will insert some barriers to prevent possible reordering:

MonitorEnter
[LoadLoad] <==inserted barrier
[LoadStore]<==inserted barrier

...
[LoadStore]<==inserted barrier
[StoreStore]<==inserted barrier
MonitorExit

However, since x86 does not allow reordering of Read-Read,Read-Write and Write-Write. So all the above barriers will be mapped to no-ops.That is equivalent to say that no barriers will be inserted between MonitorEnter and MonitorExit for x86 processors. My confuse is that if we map StoreStore to no-op under x86, then how visibility is guaranteed? To be more detailed, x86 DOES employ store buffer, so to make writes performed in the critical section be visible to other processor(s), we need to flush the store buffer, hence a Write barrier is needed. From the perspective of visibility a should be mapped to sfence/mfence/Lock#? But the cookbook says it should be mapped to no-op from the perspective of reordering prevention. Or, the key point is that visibility guarantee is done by MonitorEnter and MonitorExit itself? If it is the case, I think they might use so-called Read barrier and Write barrier to guarantee visibility, right?

x86's own memory model already guarantees that all writes have StoreStore semantics so you don't have to explicitly provide a barrier. This is pretty common (on AArch64 you don't need explicit barriers for the CAS implementation side the used instructions provide all the guarantees you need) although x86 generally provides the strongest guarantees of modern IS As so you rarely need explicit barriers at all. — Voo, Feb 10 '16 at 08:45
Thank you @Voo, maybe my question can be summarized as: under x86, if we map StoreStore to no-op(since x86 does NOT allow reordering of Write-Write) for the purpose of reordering prevention, then for the purpose of visibility guarantee we might map StoreStore to sfence/mfence/LOCK prefixed instructions. This seems to be weird, right? — user2351818, Feb 10 '16 at 08:58
Stores on x86 give you the visibility guarantees that you need too. About flushing the store buffer: true that's necessary (well or something similar that had the same effect - implementation detail and all), but that's what StoreLoad barriers do - those are needed on x86. They're also the most expensive ones for exactly that reason. — Voo, Feb 10 '16 at 08:58

score 2 · Accepted Answer · edited May 23 '17 at 12:34

Although we can say that memory barrier is capable of:

guarantee visibility(via flushing store buffer and/or applying invalidate queue)
prevent reordering(inhibit reordering between load and/or store that precedes or follows a memory barrier)

But this does NOT mean that a memory barrier should always do the above two things together, the Unsafe.doPutOrderedXX methods provided by the Sun JDK is such an example:

SomeClass temp = new SomeClass(); //S1
unsafe.putOrderedObject(this, valueOffset, null);
Object target = temp; //S2

unsafe.putOrderedObject here serves as a StoreStore barrier,hence prevents reordering of S1 and S2, but it does NOT guarantee that the result of S1 will be visible to other processors/threads(since there is no such need).

More info on Unsafe and volatile:

That's to say, under x86, for the purpose enter code hereof reordering prevention, a StoreStore can be mapped to no-op, for the purpose of visibility guarantee, a StoreStore should be mapped to some instructions like sfence/mfence/LOCK#.

Another example is the final keyword. To enforce the semantics of final(an observed variable's value cannot be changed), the compiler should insert a StoreStore barrier between writing to final fields and returning from that constructor.The reason to do this is that: ensure that writing to final fields should be visible to other processor(s) before writing the reference of constructed object to a reference variable. Actually, this means a requirement on ordering rather than visibility, hence it is not necessary for the results of writing to final being flushed(flushing Store Buffer/or Cache) before returning from that constructor. Therefore, under x86, the JVM do NOT insert any barrier for final fields.

On x86, there's an implicit StoreStore barrier between every store instruction. The store buffer always flushes itself on its own as fast as it can. You only need `mfence` of a `lock`ed instruction as a barrier if you want to make the current core *wait* until earlier stores are globally visible, before doing any later loads, i.e. a StoreLoad barrier. If you need a store to be visible *before any later reads*, you need a StoreLoad barrier, not just StoreStore. This is true on any normal ISA, not just x86: a store will pretty quickly become globally visible even if you don't wait for it. — Peter Cordes, Feb 14 '19 at 05:53
To put it another way **systems with coherent caches don't need explicit flushes**, and all normal systems that can run Java (x86, ARM, MIPS, PowerPC, etc.) are cache-coherent. Barriers only make the current core wait until a store is visible before later stores, or later stores and loads, are allowed to become visible. — Peter Cordes, Feb 14 '19 at 05:55
And BTW, `unsafe.putOrderedObject` isn't just a StoreStore barrier, it *is* a store. http://robsjava.blogspot.com/2013/06/a-faster-volatile.html shows how to use it as a release-store. (Or maybe not release, if it's really only storestore and not loadstore). But anyway, it actually writes somewhere, so you should use it with a `valueOffset` that points to a class member variable you actually want to write. — Peter Cordes, Feb 14 '19 at 06:09
Also, final fields generate LoadStore, not StoreStore barriers in Java https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/mechanical-sympathy/AYWEM491N4U/GG089IeAAAAJ — Andrey Lomakin, Jul 17 '19 at 15:19

How is a StoreStore barrier mapped to instructions under x86?

1 Answers1