0

enter image description here

Let's say thread 1(executed in Core 0) updates a global variable, and the updated global value is cached in Core 0's L1 cache(not flushed to the main memory). Then thread 2 starts to execute in Core 3, and it tries to read the global variable, and read it from the main memory(since it doesn't have the cached value), so thread 2 is reading a outdated value.

I know in C you can use volatile to force the compilier do not read the value form CPU registers, which means that volatile varaible will get its value from cache or main memory. In my above scenario, even if I declare the global variable with volatile, the latest value will still be cached in L1 cache, the main memory still has an old value which will be read by thread 2. So how can we fix this issue? or maybe my understanding is wrong, using volatile will make the variable updated in main memory directly so everytime you try to read/write a volatile variable, you read/write it from/to the main memory directly?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I'm no expert, but I think the premise of your question is wrong. There is some real hardware magic that happens at low levels behind the scenes to ensure your scenario doesn't happen. If Core3 needs a value from main memory that has an updated value in Core0 cache, the CPU knows this and the value will be fetched from Core0 cache. This may involve (slow) flushing cache lines to main memory, maybe some kind of direct cache to cache transfer? Ideally I'd say you don't want different cores operating on the same data, and hopefully your scheduler knows that. – yano Feb 10 '21 at 06:38
  • 1
    I also think your understanding of `volatile` is flawed. All `volatile` does is tell the compiler not to make any optimizations regarding how the value might change. For example, if you're polling a value in a register in a tight loop, and that value could change at any time unbeknownst to the optimizer, you should make it `volatile`, forcing a read every time thru the loop. Otherwise, the optimizer will say "I'm reading this value, and it never changes in nearby code, I'm only going to read it once". The value is read once, your loop gets optimized out, and change in the register is missed. – yano Feb 10 '21 at 06:48
  • https://www.geeksforgeeks.org/understanding-volatile-qualifier-in-c/ – yano Feb 10 '21 at 06:48
  • And why you need this? Memory controller decides when to get the value from cache or from RAM. But why you think that second thread will not use the same cache when accessing the same memory address? – i486 Feb 10 '21 at 07:03
  • What kind of C programs are you coding (application code, or embedded software)? On which processor? With which C compiler? Operating system kernels? – Basile Starynkevitch Feb 10 '21 at 07:20
  • Use atomic variables if you don't want to use mutexes to control access to the variable. https://en.cppreference.com/w/c/atomic – Shawn Feb 10 '21 at 08:41

2 Answers2

1

To some extent, people noting that the premise of your question is flawed is a reasonable answer. In general, this happens rarely if at all, and is usually indistinguishable from a race condition.

However yes it can happen. See for example memory barriers which are a great example of how such a condition (albeit due to OOO execution etc.) can occur.

That being said, what you're looking for to make sure the specific occurrence you've noted cannot happen is called a "cache flush". This can be genuinely important on ARM/ARM64 processors where separate data and instruction caches exist, but it's also a good habit to get into for data that is passed between threads this way. You can also check out the __builtin___clear_cache c compiler builtin which performs a similar task. Hopefully one of these will help you get to the bottom of your problem.

However, most likely you're not running into a caching issue, and a race condition is far more likely to be arising. If memory barriers/cache flushes don't fix your issue, audit your code very carefully for raciness.

Roguebantha
  • 814
  • 4
  • 19
  • thanks for your answer, just to confirm that the problem I described in my post can happen although very unlikely. Is it correct? But someone in this post said hardware will prevent this issue is it correct?And if I run the program thousands of times, surely there will be one time that causes issue, so the only way to stop it is to call `cacheflush`? –  Feb 10 '21 at 07:42
  • 1
    Yes, it can happen, although it's very unlikely, and you may have a difficult time provably demonstrating that cache desync's are occurring. Hardware does try and prevent/mitigate this issue, but that depends largely on the ISA or architecture, and it should not necessarily be taken as a guarantee. cacheflush, the gcc builtin, and memory barriers are probably the safest ways to prevent cache desync from being the issue, yes, although it's not guaranteed to be the issue you're experiencing. – Roguebantha Feb 10 '21 at 20:30
  • @amjad: You don't need to manually flush cache lines to make stores visible to loads in other cores. Cache is coherent. (Except instruction caches in some non-x86 ISAs, but that's a major distraction from data sharing). related: https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/, and [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) explains that this is why `volatile` does work as a hacky `memory_order_relaxed` on normal systems because of this. – Peter Cordes May 12 '21 at 12:57
0

How to make sure C multithreading program read the latest value from main memory?

You probably want to use some thread library like POSIX threads. Read some Pthread tutorial, see pthreads(7) and use pthread_create(3), pthread_mutex_init, pthread_mutex_lock, pthread condition variables, etc etc

Read also the documentation of GNU libc and of your C compiler (e.g. GCC, to be used as gcc -Wall -Wextra -g) and of your debugger (e.g. GDB).

Be prepared to fight against Heisenbugs.

In general you cannot prove statically that your C program don't have race conditions. See Rice's theorem. You could use tools like Frama-C or the Clang static analyzer, or write your own GCC plugin, or improve or extend Bismon described in this draft report.

You could be interested by CompCert.

you cannot be sure that your program read the "latest" value from memory.

(unless you add some assembly code)

Read about cache coherence protocols.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547