Lock free readonly-shared memory don’t care about memory order, only ensure visibility?

Question

So I’m emulating a small microprocessor in c that has an internal flash storage represented as an array of chars. The emulation is entirely single threaded and operates on this flash storage as well as some register variables.

What I want to do is have a second “read” thread that periodically (~ every 10ms or about the monitor refreshrate) polls the data from that array and displays it in some sort of window. (The flash storage is only 32KiB in size so it can be displayed in a 512x512 black and white image).

The thing is that the main emulation thread should have an absolutely minimal performance overhead to do this (or optimally not even care about the second thread at all). A rw mutex is absolutely out of the question since that would absolutely tank the performance of my emulator. Best case scenario as I said is to let the emulator be completely oblivious of the existence of the read thread.

I don’t care about memory order either since it doesn’t matter if some changes are visible earlier than others to the read thread as long as they are visible at some point at all.

Is this possible at all in c / cpp or at least through some sort of memory_barrier() function that I can call in my emulation thread about every 1000th clock cycle that would then assure visibility to my read thread?

Is it enough to just use a volatile on the flash memory? / would this affect performance in any significant way?

I don’t want to stall the complete emulation thread just to copy over the flash array to some different place.

Pseudo code:

int main() {
    char flash[32 * 1024];
    flash_binary(flash, "program.bin");

    // periodically displays the content of flash
    pthread_t *read_thread = create_read_thread(flash);

    // does something with flash, highly performance critical
    emulate_cpu(flash);
    
    kill_read_thread(read_thread);
}

*I don’t care about memory order either since it doesn’t matter if some changes are visible earlier than others to the read thread as long as they are visible at some point at all.* Do you really care if you catch an update "in progress"? Apparently not, given what you've posted, so you don't seem to need any locking or synchronization at all since your display is just a snapshot for display purposes. "I want to see it eventually" should be satisfied with just reading what it reads. Try it and post if it works. — Andrew Henle, Dec 11 '21 at 15:45
Yeah sounds right, would I need volatile to ensure the compiler doesn’t optimize the memory access away? I will try it as soon as I can and post an update.. — Ferdinand S, Dec 11 '21 at 15:50
*would I need volatile to ensure the compiler doesn’t optimize the memory access away?* Probably. This does seem to fit the definition of a `volatile` access. — Andrew Henle, Dec 11 '21 at 15:53
@AndrewHenle: Unfortunately, even with `volatile`, it's still a data race and therefore UB by the C and C++ standards. — Nate Eldredge, Dec 11 '21 at 17:19
@FerdinandS: You have tagged this both [tag:c] and [tag:c++] but they are different languages with different rules (that sometimes overlap but not always). Can you narrow it down to one of them? — Nate Eldredge, Dec 11 '21 at 17:20
AFAIK the only standard-compliant solution is to make it an array of `atomic_char` (or `std::atomic` and do all accesses with atomic loads and stores. If you use `relaxed` memory ordering then many machines can use ordinary load and store instructions with no extra overhead. It will still inhibit many possible optimizations, so there can be a performance impact, but probably not as much as `volatile` in that regard. — Nate Eldredge, Dec 11 '21 at 17:27
That said, on most real-life machines it will generally be fine without atomic or `volatile` in the emulator thread at all. The main issue is that the compiler might optimize out writes to memory and keep values in registers instead. Something like an `atomic_signal_fence` should force those to be flushed out, or `asm volatile ("" : : : "memory");` in gcc/clang. How portable do you need to be? — Nate Eldredge, Dec 11 '21 at 17:34
10ms is _way_ fast. Quite likely it's even faster than the refresh rate of the monitor on which the data are displayed. Why not 100ms? or 200ms? or even 1000ms? — Solomon Slow, Dec 11 '21 at 18:08
@NateEldredge the project is pure c at the moment, but I tagged cpp to reach more people and because it should be the same. Concerning UB - in this case I am well aware and don’t (really) care that there is a data race which by definition is of course UB. The real question is if it will work regardless. I read up a bit more on atomic variables and it seems as though an access to some atomic variable will ensure visibility of any data that was previously written. I guess I’ll try both solutions (with no sync at all and with an atomic memory sync variable) and benchmark them. — Ferdinand S, Dec 12 '21 at 11:40
@SolomonSlow 10ms is not really a super crazy fast value. If you do the math it would take 3.2MiB/s to copy my 32KiB array over to some different place 100 times per second. A modern cpu will laugh about that and since I’m planning on doing it entirely in a different thread I don’t care too much about the performance. — Ferdinand S, Dec 12 '21 at 11:42
I don't mean fast relative to how much data the CPU can move. I mean fast relative to the ability of somebody watching the screen to perceive the changing data. — Solomon Slow, Dec 12 '21 at 15:19
"the project is pure c at the moment, but I tagged cpp to reach more people and because it should be the same." That tends to annoy people on this site. Please see https://stackoverflow.com/tags/c/info and the section about "Using c and c++ together". There are often more differences than you think. And even on a simple level, it is annoying when writing an answer to keep wondering whether to talk about `atomic_char` or `std::atomic`. — Nate Eldredge, Dec 12 '21 at 16:27
@NateEldredge oh right sorry about that. I haven’t asked many questions here before. I’ll keep that in mind for the future. — Ferdinand S, Dec 12 '21 at 19:11
@NateEldredge: In current compilers, `atomic_load_explicit(&c, memory_order_relaxed)` is very similar to a `volatile` access; compilers don't optimize atomics. [Why don't compilers merge redundant std::atomic writes?](https://stackoverflow.com/q/45960387). I'd expect an `atomic_char arr[]` array carefully using mo_relaxed for all references to compile to pretty much the same asm as `volatile char arr[]`. But yes, it would gimp all accesses to it, preventing compile-time coalescing of accesses to multiple adjacent bytes, and stuff like that. — Peter Cordes, Dec 13 '21 at 19:40
@NateEldredge: Yes, if you care about strict ISO C, the only option is `atomic_char` or locking. But if not, mainstream C implementations effectively define the behaviour of `volatile` to some extent (especially ones that still care about compiling code using hand-rolled atomic from before C11, like the Linux kernel); merely running on cache-coherent hardware makes `volatile` accesses visible between threads (because compilers can't optimize them away either). [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) — Peter Cordes, Dec 13 '21 at 19:52
@NateEldredge: So anyway, I'd agree with your suggestion to just use plain non-volatile non-atomic, and some kind of barrier like `atomic_signal_fence()` occasionally in the main thread. [Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) describes some of the dangers of not using `volatile` for access to maybe-changing data (e.g. in the Linux kernel where you can't use C11 stdatomic like a normal person), but in this case it's very likely fine. — Peter Cordes, Dec 13 '21 at 19:54

Lock free readonly-shared memory don’t care about memory order, only ensure visibility?

0 Answers0