I'm trying to wrap my head around the issue of memory barriers right now. I've been reading and watching videos about the subject, and I want to make sure I understand it correctly, as well as ask a question or two.
I start with understanding the problem accurately. Let's take the following classic example as the basis for the discussion: Suppose we have 2 threads running on 2 different cores
This is pseudo-code!
We start with int f = 0; int x = 0;
and then run those threads:
# Thread 1
while(f == 0);
print(x)
# Thread 2
x = 42;
f = 1;
Of course, the desired result of this program is that thread 1 will print 42.
NOTE: I leave "compile-time reordering" out of this discussion, I only want to focus on what happens in runtime, so ignore all kinds of optimizations that the compiler might do.
Ok so from what I understand the problem here is what is called "memory reordering": the CPU is free to reorder memory operations as long as the end result is what the program expects it to be. In this case, within thread 2, the f = 1
may be executed before x = 42
. In this case, thread 1 will print 0, which is not what the programmer want.
At this point, Wikipedia points at another possible scenario that may occur:
Similarly, thread #1's load operations may be executed out-of-order and it is possible for x to be read before f is checked
Since we're talking right now about "out of order execution" - let's ignore for a moment from the caches of the cores. So let's analyze what happens here. Start with thread 2 - the compiled instructions will look (in pseudo-assembly) something like:
1 put 42 into register1
2 write register1 to memory location of x
3 put 1 into register 2
4 write register2 to memory location of f
Ok so I understand that 3-4 may be executed before 1-2. But I don't understand the equivalent in thread 1:
Let's say the instructions of thread 1 will be something like:
1 load f to register1
2 if f is 0 - jump to 1
3 load x to register2
4 print register2
What exactly may be out of order here? 3 can be before 1-2?
Let's go on: Up until now we talked about out-of-order execution, which brings me to my primary confusion:
In this great post the author describes the problem as such: Each core has its own cache, and the core does the memory operations against the cache, not against the main memory. The movement of memory from the core-specific caches to the main memory (or a shared cache) occurs in unpredictable time and order. So in our example - even if thread 2 will execute its instructions in-order - the writing of x=42
will occur before f=1
, but that will be only to the cache of core2. The movement of these values to a shared memory may be in the opposite order, and hence the problem.
So I don't understand - when we talk about "memory reordering" - do we talk about Out-of-order execution, or are we talking about the movement of data across caches?