As we know from from C11-memory_order: http://en.cppreference.com/w/c/atomic/memory_order
And the same from C++11-std::memory_order: http://en.cppreference.com/w/cpp/atomic/memory_order
On strongly-ordered systems (x86, SPARC, IBM mainframe), release-acquire ordering is automatic. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store-release or perform non-atomic loads earlier than the atomic load-acquire)
But is this true for x86-SSE-instructions (except of [NT] - non-temporal, where we always must use L/S/MFENCE)?
Here said, that "sse instructions ... is no requirement on backwards compatibility and memory order is undefined". It is believed that the strict orderability left for compatibility with older versions of processors x86, when it was needed, but new commands, namely SSE(except of [NT]) - deprived automatically release-acquire of order, is it?