How long does it take for a write in shared memory to get reflected in other process

Question

Lets say we have 2 processes, and a shared memory in between. Assume we also have a pointer to the start of shared memory in both processes. Setup: process 1 is writing to some offset from the start of the memory, and process 2 is continuously looping to copy the data from that offset. Also assume that there are no locks or atomics to enforce memory synchronization.

Process 1 :

char *sh_mem;
size_t offset;
char data[4];
memcpy(sh_mem + offset, data, 4);

Process 2:

char *sh_mem;
size_t offset;
char read[4];
while (true)
    memcpy(read, sh_mem + offset, 4);

Assume same offset is given. So my doubt is, that after what delay, can we kind of guarantee that the process 2 is reading the value updated by process 1?

Also, it there some sort of mechanism which enforces that eventually process 2 will see the cache block its reading is dirty, and it should get the updated block. Makes sense to think that

this is outside of the memory model that the C++ standard describes. So this depends on the OS and mainly the processor architecture. Do you have a specific combination of OS and architecture in mind? — PeterT, Feb 17 '23 at 07:49
The change in memory should be considered immediate. But you need some kind of synchronization between the processes, you can't "poll" the memory since access to it is not atomic. — Some programmer dude, Feb 17 '23 at 07:49
I do not have experience with shared memory, but in similar situations, I would consider a "hand-shake" variable (process 1 writes the data if the variable is "0", then sets the variable to "1"; process 2 reads the data if the variable is "1", then sets the variable to "0"). — nielsen, Feb 17 '23 at 08:26
I know that we can use some atomic variable to get a guarantee that data has been set. But i just got curious as to how the underlying things work. — W1nTer003, Feb 17 '23 at 09:40
@W1nTer003> the TLDR: there is no such guarantee whatsoever. Not only CPUs have caches, and they will not refresh them unless explicitly told to, but the compiler itself is allowed to assume that, in the absence of synchronisation, reading twice from the same place will get the same value, so it is likely optimize away subsequent reads. — spectras, Feb 17 '23 at 11:18

ZeroZ30o · Answer 1 · 2023-02-17T11:07:51.473

0

If you are attempting to simply "wait" for a while before reading, don't: you have absolutely no guarantee that both programs will keep running. One program may hang for an hour, while the other keeps running.

But, to properly answer the question: when you write to memory, that memory is instantly visible to everything else running on that same thread (CPU thread). Other threads may not see the updated memory yet, and may have a cached version on the CPU, but this is only the case if you don't use synchronization. If you use synchronization, the CPU will make sure to update the cache on other threads (CPU thread) when they attempt to look at that memory (or by updating the cache immediately).

I will say, however, that it doesn't really make sense to talk about a time guarantee for memory to "propagate". Memory doesn't really propagate (caching does). It makes more sense to discuss the frequency or delay between queries to the memory.

You also don't know if those 2 processes are running on the same CPU thread, in which case caching propagation is irrelevant (because CPU threads each have their own cache, as well as a more generic shared cache).

If you really wanna dig deep on this topic, finding information is hard, but CPU instructions DO exist for synchronization.

edited Feb 17 '23 at 11:07

answered Feb 17 '23 at 10:23

ZeroZ30o

375
2
18

*Other threads may not see the updated memory yet, and may have a cached version on the CPU,* - No, [all real CPUs use coherent caches](https://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading/58535118#58535118). A store can't commit to L1d cache in one core until it's received an acknowledgement that other cores have invalidated their copies, so it has Exclusive ownership (MESI). See also https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/ and https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ – Peter Cordes Feb 17 '23 at 21:16
If another thread continues to use an old value indefinitely, it's because the compiler kept the value in a register, not because CPU cache is stale. Registers are thread-private. – Peter Cordes Feb 17 '23 at 21:17
Anyway, it does make sense to talk about inter-core latency or inter-thread latency, like for two threads ping-ponging a cache line back and forth. e.g. with CAS (cmpxchg) to only write after the other thread has done a write. Or like [Different inter-core latency measured on two identical Skylake Xeon Gold 6154 systems](https://stackoverflow.com/q/57670764) which stores a timestamp from one thread and spins until another thread sees it, then checks `rdtsc` on the reader. There'd be some measurement overhead there. – Peter Cordes Feb 17 '23 at 21:26
https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd/2 has benchmarks on a Graviton2 showing about 35 to 55 nanosecond inter-core latency when using ARMv8.1 CAS instructions (instead of LDAEX / STLEX retry loops which are much slower with high contention.) Benchmarks like that tell you the *earliest* you might see a result under good conditions; if the OS process scheduler puts one thread to sleep then it might not even run the store for a long time. But if it does run the store, it will be visible soon. – Peter Cordes Feb 17 '23 at 21:27
Of course you need an `_Atomic` var to get any guaranteed inter-thread visibility, instead of data-race undefined behaviour like this code using `memcpy`. So yeah, agreed with your overall point that just sleeping or delaying based on time is not a good strategy. – Peter Cordes Feb 17 '23 at 21:30
@PeterCordes Oh alright, cheers, guess the info I had was wrong. – ZeroZ30o Feb 18 '23 at 17:22
I know I can use atomics or locks to get that guarantee. But I just got curious as to how things are happening underneath, but could not find helpful information. The dilemma comes since we know that eventually (very long time gap) the changes will be reflected, and it is happening also. But is there some kind of memory flushing clock that is present inside? otherwise, who makes that happen is my doubt! – W1nTer003 Feb 20 '23 at 10:14

How long does it take for a write in shared memory to get reflected in other process

1 Answers1