Ordering of read/write operations in a C++ queue

Question

Let's assume we have a SyncQueue class with the following implementation:

class SyncQueue {
    std::mutex mtx;
    std::queue<std::shared_ptr<ComplexType> > m_q;
public:
    void push(const std::shared_ptr<ComplexType> & ptr) {
        std::lock_guard<std::mutex> lck(mtx);
        m_q.push(ptr);
    }
    std::shared_ptr<ComplexType> pop() {
        std::lock_guard<std::mutex> lck(mtx);
        std::shared_ptr<ComplexType> rv(m_q.front());
        m_q.pop();
        return rv;
    }
};

then we have this code that uses it:

SyncQueue q;

// Thread 1, Producer:
std::shared_ptr<ComplexType> ct(new ComplexType);
ct->foo = 3;
q.push(ct);

// Thread 2, Consumer:
std::shared_ptr<ComplexType> ct(q.pop());
std::cout << ct->foo << std::endl;

Am I guaranteed to get 3 when ct->foo is printed? mtx provides happens-before semantics for the pointer itself, but I'm not sure that says anything for the memory of ComplexType. If it is guaranteed, does it mean that every mutex lock (std::lock_guard<std::mutex> lck(mtx);) forces full cache-invalidation for any modified memory locations up-till the place where memory hierarchies of independent cores merge?

Any data that is written to memory causes that cache-line to be invalidated, but I don't know what would make you say "forces full cache-invalidation" — kmdreko, Apr 13 '16 at 20:14
@Brian - Yes, correct - assume that the sequence is as shown above. — neverlastn, Apr 13 '16 at 20:17
This answer to an older questions suggests that yes, mutex functions issue memory barrier instructions if required by the hardware: http://stackoverflow.com/a/24143387/1401351 — Peter, Apr 13 '16 at 20:25
@Peter - that's great. As a clarification, does that mean that any writes, by any thread or process on that core's L1 and L2 will be marked as invalid across all other cores' caches, right? — neverlastn, Apr 13 '16 at 20:29
`ct->foo = 3` happens-before `std::cout << ct->foo`, and so therefore the latter is guaranteed to observe the side effect of the former. The assignment is sequenced-before `q.push(ct)`, which happens-before `q.pop()`, which is sequenced-before read access to `ct->foo`. "happens-before" relationship is transitively closed over "happens-before" and "sequenced-before" edges. Cashes and cores are irrelevant implementation details. — Igor Tandetnik, Apr 13 '16 at 21:12
@IgorTandetnik I certainly agree and I agree that "cashes and cores are irrelevant implementation details" in regards to semantics but I make it related to my question because it affects performance in multicore implementations i.e. every recent CPU. — neverlastn, Apr 13 '16 at 21:17

Vadim Key · Accepted Answer · 2016-04-13T21:30:59.873

std::mutex() is conformant to Mutex requirements (http://en.cppreference.com/w/cpp/concept/Mutex)

Prior m.unlock() operations on the same mutex synchronize-with this lock operation (equivalent to release-acquire std::memory_order)

release-acquire is explained here (http://en.cppreference.com/w/cpp/atomic/memory_order)

Release-Acquire ordering

If an atomic store in thread A is tagged memory_order_release and an atomic load in thread B from the same variable is tagged memory_order_acquire, all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B, that is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory.

The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.

Code example in this section is very similar on yours. So it should be guaranteed that all writes in thread 1 will happen before mutex unlock in push().

Of course if "ct->foo = 3" hasn't any special tricky meaning where actual assignment happens in another thread :)

wrt cache-invalidation, from cppreference:

On strongly-ordered systems (x86, SPARC TSO, IBM mainframe), release-acquire ordering is automatic for the majority of operations. No additional CPU instructions are issued for this synchronization mode, only certain compiler optimizations are affected (e.g. the compiler is prohibited from moving non-atomic stores past the atomic store-release or perform non-atomic loads earlier than the atomic load-acquire). On weakly-ordered systems (ARM, Itanium, PowerPC), special CPU load or memory fence instructions have to be used.

So it really depends from the architecture.

"Of course if "ct->foo = 3" hasn't any special tricky meaning" Sure! :) — neverlastn, Apr 13 '16 at 21:20

Ordering of read/write operations in a C++ queue

1 Answers1