Accessing a global variable from different threads in Assembly

Question

Based on my understanding, if I have a global variable and two or more threads that are trying to access it, and each thread is running on a different CPU core, then each CPU core will cache a copy of the global variable and whenever a thread tries to access the global variable, it is the cached copy that will be accessed, and not the global variable in memory.

Now say that I have two threads created using CreateThread() and each thread is running on a different CPU core, and one thread sets a global variable's value while the other thread reads its value.

Is there an Assembly instruction that forces the cached copy of the global variable to be flushed to memory after setting its value, or an Assembly instruction that updates the cached copy of the other CPU core that the other thread is running on?

*then each CPU core will cache a copy of the global variable and whenever a thread tries to access the global variable, it is the cached copy that will be accessed, and not the global variable in memory.* - this is false. all cpu will be view the same memory. of course exist issue with synchronization - when one core modify the memory - when another view this. but anyway your initial assume absolute wrong — RbMm, May 11 '18 at 17:30
The cache provides a coherent memory view. It behaves as-if there is only a single copy in memory. — LWimsey, May 11 '18 at 17:30
read for example [Lockless Programming Considerations](https://msdn.microsoft.com/en-us/library/windows/desktop/ee418650(v=vs.85).aspx) and [Memory Barriers](http://preshing.com/20120710/memory-barriers-are-like-source-control-operations/) and related posts in this blog and [memory_order](http://en.cppreference.com/w/cpp/atomic/memory_order) — RbMm, May 11 '18 at 17:35
and also you may be look for [`MemoryBarrier`](https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/content/wdm/nf-wdm-kememorybarrier) — RbMm, May 11 '18 at 17:40
@LWimsey, If the caches behaved the way you say, then they would serve no purpose. The reality is, the caches only provide a coherent memory view under certain well-defined circumstances involving the _memory barrier_ instructions that others have mentioned (above). Those instructions are expensive, because they limit the effectiveness of the cache. We design our programs so that their threads don't _need_ a coherent view most of the time, and use the memory barrier instructions as seldom as possible. — Solomon Slow, May 11 '18 at 18:35
@jameslarge: non-coherent caches that require manual flushing can be useful for read-mostly situations, and/or for *many* cores. e.g. in a GPU. See this comment on [What is the point of MESI on Intel 64 and IA-32](https://stackoverflow.com/q/49843709#comment86752890_49848016) — Peter Cordes, May 11 '18 at 19:39
@Steve: x86 has coherent caches: see [What is the point of MESI on Intel 64 and IA-32](https://stackoverflow.com/q/49843709) and [Does a memory barrier ensure that the cache coherence has been completed?](https://stackoverflow.com/q/42746793) for more details on what you need to know about ordering. (Those aren't really duplicates, but they do answer this question). You can use `mfence` to wait until a store has become globally visible before doing further loads/stores, for example. See http://preshing.com/20120515/memory-reordering-caught-in-the-act/. — Peter Cordes, May 11 '18 at 19:42
@jameslarge `X86` uses `MESI` (or a close cousin) for cache coherency. This guarantees two things... 1) For some memory location, only a single core at a time can write (while other cores have neither write- nor read-access) while multiple cores can have read-only access... 2) any write will be immediately visible to all cores... This does not depend on memory barriers. — LWimsey, May 11 '18 at 21:00
@Peter Cordes Just to make sure I got this right, if I am on the x86 architecture, and I have a thread that sets a global variable's value, and another thread tries to read this value, it will read the last value that has been set by the first thread, and I don't have to use any memory barriers to make this happen. But the instructions can still be executed out of order, and to prevent that, I have to use a memory barrier. Am I correct? — Steve, May 11 '18 at 22:56
Yes, but often you don't need any barriers. x86's strong memory model means that plain `mov [shared], eax` is a release-store. You only need `mfence` for sequential consistency with stores + loads. (And BTW, even weakly ordered architectures like ARM have coherent caches, so you don't need barriers to make a store visible promptly.) — Peter Cordes, May 11 '18 at 23:15

Accessing a global variable from different threads in Assembly

0 Answers0