I am currently trying to learn the C++11 threading api, and I am finding that the various resources don't provide an essential piece of information: how the CPU cache is handled. Modern CPUs have a cache for each core (meaning different threads may use a different cache). This means that it is possible for one thread to write a value to memory, and for another thread to not see it, even if it sees other changes the first thread also made.
Of course, any good threading API provides some way to solve this. In C++'s threading api, however, it is not clear how this works. I know that a std::mutex
, for example, protects memory somehow, but it isn't clear what it does: does it clear the entire CPU cache, does it clear just the objects accessed inside the mutex from the current thread's cache, or something else?
Also, apparently, read-only access does not require a mutex, but if thread 1, and only thread 1, is continually writing to memory to modify an object, won't other threads potentiality see an outdated version of that object, thus making some sort of cache clearing necessary?
Do the atomic types simply bypass the cache and read the value from main memory using a single CPU instruction? Do they make any guarantees about the other places in memory being accessed?
How does memory access in C++11's threading api work, in the context of CPU caches?
Some questions, such as this one talk about memory fences, and a memory model, but no source seems to explain this in the context of CPU caches, which is what this question asks for.