Consider the following program: 2 threads are iterating through the same function that consists of incrementing the value of a shared counter variable. There's no lock protecting the variable, as such we're talking about lock-free programming. We're also ensuring that the threads will run on different cores/CPUs. The number of iterations is sufficiently large (eg N=100,000).
The operations themselves are below, listed as pseudocode. As expected, there will be various delays between the instructions, depending on what other things the CPUs do. The one below is just one possible way of running them.
CPU 0 | CPU 1
------------------------------------------
LOAD count |
INC count | LOAD count
| INC count
| STORE count
STORE count |
Let's not target only the x86 architecture, where the memory model is pretty strong. In fact, let's consider a "memory-ordering-hostile" architecture (as per C.6.1 of McKenney's book).
The main problem with this code is that - without exception - the end result will be wrong. The race condition will make it often enough that one CPU will compute the new counter value at the same time as the other one does the same thing, based on the same count
value. The result is that each CPU will write back to the corresponding cache line an incremented value of count
, but the same one. This doesn't contradict the MESI protocol of cache consistency, as each CPU gets the cache line exclusively and writes to it in sequence; the only unfortunate thing is that it's the same counter value being written.
What I'm interested however is the impact of placing memory barriers. Excluding the problem in the preceding paragraph, will the fact that memory barriers aren't put in place (or they're placed poorly) bring their own "negative" contribution to how this program works ?
Thinking intuitively about store buffers, and the fact that the values in there probably can't be "skipped" or "lost", they'll eventually have to be written to the cache line. So write barriers won't have an impact. Would the invalidate queues and the read barriers have no impact as well ? Is my assumption correct ? Am I missing something ?