39

Cache is controlled by cache hardware transparently to processor, so if we use volatile variables in C program, how is it guaranteed that my program reads data each time from the actual memory address specified but not cache.

My understanding is that,

  1. Volatile keyword tells compiler that the variable references shouldn't be optimized and should be read as programmed in the code.

  2. Cache is controlled by cache hardware transparently, hence when processor issues an address, it doesn't know whether the data is coming from cache or the memory.

So, if I have a requirement of having to read a memory address every time required, how can I make sure that its not referred from cache but from required address?

Some how, these two concepts are not fitting together well. Please clarify how its done.

(Imagining we have write-back policy in cache (if required for analyzing the problem))

Thank you, Microkernel :)

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
Microkernel
  • 1,347
  • 4
  • 17
  • 38

7 Answers7

38

Firmware developer here. This is a standard problem in embedded programming, and one that trips up many (even very experienced) developers.

My assumption is that you are attempting to access a hardware register, and that register value can change over time (be it interrupt status, timer, GPIO indications, etc.).

The volatile keyword is only part of the solution, and in many cases may not be necessary. This causes the variable to be re-read from memory each time it is used (as opposed to being optimized out by the compiler or stored in a processor register across multiple uses), but whether the "memory" being read is an actual hardware register versus a cached location is unknown to your code and unaffected by the volatile keyword. If your function only reads the register once then you can probably leave off volatile, but as a general rule I will suggest that most hardware registers should be defined as volatile.

The bigger issue is caching and cache coherency. The easiest approach here is to make sure your register is in uncached address space. That means every time you access the register you are guaranteed to read/write the actual hardware register and not cache memory. A more complex but potentially better performing approach is to use cached address space and have your code manually force cache updates for specific situations like this. For both approaches, how this is accomplished is architecture-dependent and beyond the scope of the question. It could involve MTRRs (for x86), MMU, page table modifications, etc.

Hope that helps. If I've missed something, let me know and I'll expand my answer.

Andrew Cottrell
  • 3,312
  • 3
  • 26
  • 41
  • The purpose of `volatile`, when using a good compiler, should be to ensure that the generated code lets the processor know about everything that needs to be written before a certain point, and doesn't ask the processor to read information until afterward. A programmer may also need to use intrinsics or other means to force hardware cache flushes, but forcing a hardware cache flush would be useless if a compiler was register-caching things in ways the hardware knew nothing about. – supercat Feb 18 '17 at 23:33
9

From your question there is a misconception on your part.
Volatile keyword is not related to the cache as you describe.

When the keyword volatile is specified for a variable, it gives a hint to the compiler not to do certain optimizations as this variable can change from other parts of the program unexpectedly.

What is meant here, is that the compiler should not reuse the value already loaded in a register, but access the memory again as the value in register is not guaranteed to be the same as the value stored in memory.

The rest concerning the cache memory is not directly related to the programmer.

I mean the synchronization of any cache memory of CPU with the RAM is an entirely different subject.

Cratylus
  • 52,998
  • 69
  • 209
  • 339
  • So, if as I had taken a case where a variable is updated by someother thread or driver reading from input device, what is the guarantee that I am reading the correct value not something cached? How do you avoid such scenario in a code? – Microkernel Oct 24 '11 at 08:14
  • If you use `volatile` it is guaranteed that you will always read the latest update that was done in memory from another thread.But I get the feeling your concern is more at the OS level i.e. cache vs memory synchronization – Cratylus Oct 24 '11 at 08:28
  • @Cratylus If you use threads, "latest", "past"... aren't clearly defined between threads running on diff cores. – curiousguy Nov 13 '19 at 03:24
7

My suggestion is to mark the page as non-cached by the virtual memory manager.
In Windows, this is done through setting PAGE_NOCACHE when calling VirtualProtect.

For a somewhat different purpose, the SSE 2 instructions have the _mm_stream_xyz instructions to prevent cache pollution, although I don't think they apply to your case here.

In either case, there is no portable way of doing what you want in C; you have to use OS functionality.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    So, it depends on the platform? Hence, Cache is not controlled by cache hardware? (if hardware managed the cache completely, then it wouldn't check for the flag PAGE_NOCACHE right?) – Microkernel Oct 24 '11 at 07:02
  • 1
    @Microkernel: It **is** managed by the hardware. But the operating system tells the hardware what to do (after all, the hardware has no idea how the OS wants to manage memory), and you're requesting the OS to do what you want. *And all of this information is stored in -- guess where? -- memory itself.* It's a passive process, though -- the OS only intervenes if something goes haywire (e.g. page fault). Other than that, the hardware simply continues doing what the OS asked it to do, without OS intervention. – user541686 Oct 24 '11 at 07:03
  • Hmm, OK... Seems my understanding is wrong somewhere, I always believed that CPU Cache is transparent to everyone other than Cache hardware! Any references that I have to read to get my concepts right? ! Thanks a lot for clarification :) – Microkernel Oct 24 '11 at 07:11
  • 4
    @Microkernel: Sure! :) Basically, the operating system stores all of its memory management information inside "page tables" in memory, and tells the CPU where to look for the information. The CPU then manages everything, and asks the operating system for "help" whenever it can't decide what to do. You can read about paging [here](http://wiki.osdev.org/Paging) and about caching [here](http://wiki.osdev.org/CPU_Caches); let me know if you still have any questions. (This is why they say the operating system sits between the hardware and software -- it really does!) – user541686 Oct 24 '11 at 07:14
2

Wikipedia has a pretty good article about MTRR (Memory Type Range Registers) which apply to the x86 family of CPUs.

To summarize it, starting with the Pentium Pro Intel (and AMD copied) had these MTR registers which could set uncached, write-through, write-combining, write-protect or write-back attributes on ranges of memory.

Starting with the Pentium III but as far as I know, only really useful with the 64-bit processors, they honor the MTRRs but they can be overridden by the Page Attribute Tables which let the CPU set a memory type for each page of memory.

A major use of the MTRRs that I know of is graphics RAM. It is much more efficient to mark it as write-combining. This lets the cache store up the writes and it relaxes all of the memory write ordering rules to allow very high-speed burst writes to a graphics card.

But for your purposes you would want either a MTRR or a PAT setting of either uncached or write-through.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
0

using the _Uncached keyword may help in embedded OS , like MQX

#define MEM_READ(addr)       (*((volatile _Uncached unsigned int *)(addr)))
#define MEM_WRITE(addr,data) (*((volatile _Uncached unsigned int *)(addr)) = data)
Scott Roepnack
  • 2,725
  • 5
  • 19
  • 36
James Zhu
  • 19
  • 1
  • The code button is there for a reason. Please don't abuse formatting. – A--C Dec 17 '12 at 22:19
  • 3
    Which compiler does support the `_Uncached` keyword? Googling for "_Uncached" gives your answer as first result. – Manuel Jacob May 24 '16 at 20:16
  • @ManuelJacob GCC now has `__attribute__((uncached))` for ARC architecture. See https://gcc.gnu.org/onlinedocs/gcc-13.2.0/gcc/ARC-Type-Attributes.html (I know I'm necroposting). – Anonymous Guy Aug 24 '23 at 10:02
0

As you say cache is transparent to the programmer. The system guarantees that you always see the value that was last written to if you access an object through its address. The "only" thing that you may incur if an obsolete value is in your cache is a runtime penalty.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • 4
    Only if the machine only has one CPU. – JeremyP Oct 24 '11 at 07:53
  • 1
    @JeremyP, I think the question here was asked beyond the scope of concurrent access to shared memory. If you have that in addition, yes, everything gets much more complicated. You'd then have to apply the appropriate tools to ensure data consistency. But then, this is a more general problem, viewing it through the angle of caches is probably not the right view either. – Jens Gustedt Oct 24 '11 at 07:58
  • 2
    I do not think it was beyond the scope of concurrent access to the memory. The premise of the question is that there *is* concurrent access to memory, otherwise, as you point out, the cache is transparent. – JeremyP Oct 24 '11 at 08:05
  • The machine need not have more than one CPU. Memory-mapped device control registers can have the same effect (for hard MCUs, the designer may take care to not cache that address space, for softcores on FPGAs/PLDs, not necessarily ). See page 4 of https://www.altera.com/ja_JP/pdfs/literature/hb/nios2/n2sw_nii52007.pdf – Dmitri Oct 08 '15 at 15:37
  • @JeremyP "_Only if the machine only has one CPU_" That isn't always false but is extremely misleading. It should read: only if the machine doesn't have multiple processing units that are not intended for thread supports. **If the CPU are designed to support threads, then it's guaranteed.** – curiousguy Nov 13 '19 at 03:10
0

volatile makes sure that data is read everytime it is needed without bothering with any cache between CPU and memory. But if you need to read actual data from memory and not cached data, you have two options:

  • Make a board where said data is not cached. This may already be the case if you address some I/O device,
  • Use specific CPU instructions that bypass the cache. This is used when you need to scrub memory for activating possible SEU errors.

The details of second option depend on OS and/or CPU.

mouviciel
  • 66,855
  • 13
  • 106
  • 140
  • 7
    I have to disagree with this post. The `volatile` keyword just prevents the C compiler doing certain optimizations on variables. It does ***not*** do anything with the cache. Some compilers might give you the ability to bastardise the meaning of this keyword (the ARC compiler is one) but for most compilers this is not the case. – Jimbo Aug 02 '13 at 21:34