Data races with MESI optimization

Question

I dont really understand what exactly is causing the problem in this example:

Here is a snippet from my book:

Based on the discussion of the MESI protocol in the preceding section, it would seem that the problem of data sharing between L1 caches in a multicore machine has been solved in a watertight way. How, then, can the memory ordering bugs we’ve hinted at actually happen? There’s a one-word answer to that question: Optimization. On most hardware, the MESI protocol is highly optimized to minimize latency. This means that some operations aren’t actually performed immediately when messages are received over the ICB. Instead, they are deferred to save time. As with compiler optimizations and CPU out-of-order execution optimizations, MESI optimizations are carefully crafted so as to be undetectable by a single thread. But, as you might expect, concurrent programs once again get the raw end of this deal. For example, our producer (running on Core 1) writes 42 into g_data and then immediately writes 1 into g_ready. Under certain circumstances, optimizations in the MESI protocol can cause the new value of g_ready to become visible to other cores within the cache coherency domain before the updated value of g_data becomes visible. This can happen, for example, if Core 1 already has g_ready’s cache line in its local L1 cache, but does not have g_data’s line yet. This means that the consumer (on Core 2) can potentially see a value of 1 for g_ready before it sees a value of 42 in g_data, resulting in a data race bug.

Here is the code:

int32_t g_data = 0;
int32_t g_ready = 0;
void ProducerThread() // running on Core 1
{
  g_data = 42;
  // assume no instruction reordering across this line
  g_ready = 1;
}

void ConsumerThread() // running on Core 2
{
  while (!g_ready)
  PAUSE();
  // assume no instruction reordering across this line
  ASSERT(g_data == 42);
}

How can g_data be computed but not present in the cache?

This can happen, for example, if Core 1 already has g_ready’s cache line in its local L1 cache, but does not have g_data’s line yet.

If g_data is not in cache, then why does the previous sentece end with a yet? Would the CPU load the cache line with g_data after it has been computed?
If we read this sentence:

This means that some operations aren’t actually performed immediately when messages are received over the ICB. Instead, they are deferred to save time.

Then what operation is deferred in our example with producer and consumer threads?

So basically I dont understand how under the MESI protocol, some operations are visible to other cores in the wrong order, despite being computed in the right order by a specific core.

PS: This example is from a book called "Game Engine Architecture, Third Edition" by Jason Gregory, its on the page 309. Here is the book

Are those C operations supposed to represent asm loads/stores? If so, make your variables `volatile` so a C compiler would have to actually do that. (Including no compile-time reordering of volatile accesses.) — Peter Cordes, Jun 05 '22 at 02:06
You didn't provide enough context to be sure what point your book was making in the text those quotes are part of, but probably: (1) [The store buffer](https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram) holds stores before they commit to L1d cache. And before that, variable values get computed in registers. (2) Every address is part of some cache **line**, whether a cache currently has a valid copy of it or not. — Peter Cordes, Jun 05 '22 at 02:11
A load will result in the line getting cached in L1d of this CPU when data arrives. The question is whether the copy of the line you read is from before or after some other store commits to L1d. (3) they might be talking about invalidation queues? — Peter Cordes, Jun 05 '22 at 02:13
MESI itself doesn't reorder anything; it's the local store buffer inside each core, and hit-under-miss load reodering, that causes memory-ordering effects. (AFAIK, those local effects can explain any reordering without needing to think about invalidation queues, but possibly that matters on more exotic machines like [POWER that can store-forward between logical cores of a physical core](https://stackoverflow.com/questions/27807118/will-two-atomic-writes-to-different-locations-in-different-threads-always-be-see/50679223#50679223), so they're not multi-copy atomic. ) — Peter Cordes, Jun 05 '22 at 02:16
@PeterCordes, the snippet from the book I copied was incomplete, I updated the question and added the book Title. I understand, that my questions are not easy to understand without the complete excerpt. I think its better if I let you read the edit first and then make some clarifications based on your comments, so I write a larger response tomorrow. — a a, Jun 05 '22 at 18:15
Ok, yeah, local StoreStore reordering due to out-of-order commit from the store buffer into L1d cache is the normal cause of that reordering on weakly-ordered ISAs that allow that. MESI itself doesn't introduce reordering between cores, although of course MESI is the reason the cache-miss store can't commit and become globally visible earlier, not until this core gets exclusive ownership. See https://preshing.com/20120710/memory-barriers-are-like-source-control-operations - in that analogy, the coherent state maintained by MESI is like the "server", and store buffer + load reordering are local — Peter Cordes, Jun 05 '22 at 21:07
Anyway, keep in mind it's not about when a core *computes* something, the key time for a store is when it commits to L1d cache, after going through the store buffer. After that point, the store is globally visible to all cores. See the link in my 2nd comment. (Of course, cores can reorder loads wrt. each other as well, if the ISA allows; with program-order loads and stores like x86, a store buffer introduces only StoreLoad reordering. https://preshing.com/20120930/weak-vs-strong-memory-models/ and [Globally Invisible load instructions](https://stackoverflow.com/q/50609934)) — Peter Cordes, Jun 05 '22 at 21:13

Data races with MESI optimization

0 Answers0