44

I know that writing to a volatile variable flushes it from the memory of all the cpus, however I want to know if reads to a volatile variable are as fast as normal reads?

Can volatile variables ever be placed in the cpu cache or is it always fetched from the main memory?

oxbow_lakes
  • 133,303
  • 56
  • 317
  • 449
pdeva
  • 43,605
  • 46
  • 133
  • 171

5 Answers5

21

You should really check out this article: http://brooker.co.za/blog/2012/09/10/volatile.html. The blog article argues volatile reads can be a lot slower (also for x86) than non-volatile reads on x86.

  • Test 1 is a parallel read and write to a non-volatile variable. There is no visibility mechanism and the results of the reads are potentially stale.
  • Test 2 is a parallel read and write to a volatile variable. This does not address the OP's question specifically. However worth noting that a contended volatile can be very slow.
  • Test 3 is a read to a volatile in a tight loop. Demonstrated is that the semantics of what it means to be volatile indicate that the value can change with each loop iteration. Thus the JVM can not optimize the read and hoist it out of the loop. In Test 1, it is likely the value was read and stored once, thus there is no actual "read" occurring.

Marc Booker's tests

Credit to Marc Booker for running these tests.

Tim Bender
  • 20,112
  • 2
  • 49
  • 58
Aleksander Blomskøld
  • 18,374
  • 9
  • 76
  • 82
13

The answer is somewhat architecture dependent. On an x86, there is no additional overhead associated with volatile reads specifically, though there are implications for other optimizations.

JMM cookbook from Doug Lea, see architecture table near the bottom.

To clarify: There is not any additional overhead associated with the read itself. Memory barriers are used to ensure proper ordering. JSR-133 classifies four barriers "LoadLoad, LoadStore, StoreLoad, and StoreStore". Depending on the architecture, some of these barriers correspond to a "no-op", meaning no action is taken, others require a fence. There is no implicit cost associated with the Load itself, though one may be incurred if a fence is in place. In the case of the x86, only a StoreLoad barrier results in a fence.

As pointed out in a blog post, the fact that the variable is volatile means there are assumptions about the nature of the variable that can no longer be made and some compiler optimizations would not be applied to a volatile.

Volatile is not something that should be used glibly, but it should also not be feared. There are plenty of cases where a volatile will suffice in place of more heavy handed locking.

Tim Bender
  • 20,112
  • 2
  • 49
  • 58
  • 2
    How can this be? What about multi-core processors? (Without reading the entire link you posted) – ripper234 Jul 07 '09 at 04:49
  • 7
    Nothing is ever a "no-op", unless it can be eliminated completely from your program. Volatile reads are cheap, but not free and inhibit optimization in a way that plain reads do not (e.g a volatile read cannot be hoisted out of loops, or read from a register). The "no-op" in this case refers to the lack of corresponding fence instruction, but the read itself has semantics and those have a cost. – Nitsan Wakart Jul 08 '13 at 20:55
  • 3
    Correct. However, those are optimization semantics that lay outside of those imposed by the volatile explicitly. The main answer to the question is "It is architecture dependent". For now the commentary is true. A lot of people fear volatile and that is somewhat silly. Though I also wouldn't ever recommend glibly using volatile on everything. – Tim Bender Jul 08 '13 at 21:41
2

It is architecture dependent. What volatile does is tell the compiler not to optimise that variable away. It forces most operations to treat the variable's state as an unknown. Because it is volatile, it could be changed by another thread or some other hardware operation. So, reads will need to re-read the variable and operations will be of the read-modify-write kind.

This kind of variable is used for device drivers and also for synchronisation with in-memory mutexes/semaphores.

sybreon
  • 3,128
  • 18
  • 19
-1

Volatile reads cannot be as quick, especially on multi-core CPUs (but also only single-core). The executing core has to fetch from the actual memory address to make sure it gets the current value - the variable indeed cannot be cached.

As opposed to one other answer here, volatile variables are not used just for device drivers! They are sometimes essential for writing high performance multi-threaded code!

ripper234
  • 222,824
  • 274
  • 634
  • 905
  • 6
    *Volatile reads cannot be as quick, especially on multi-core CPUs* this is totally not true, on most hardware volatile reads are just normal loads. Loads are expensive when missing the cache but volatile reads do not go to the main memory, if there is a value in the cache and even volatile writes (which are expensive) may update the local CPU cache lines. Multi CPU/Socket depends on the cache coherence protocol to ensure the valid values but that it doesn't make volatile reads anywhere more expensive. – bestsss Nov 21 '11 at 14:16
  • @bestsss - well, I admit ignorance to details, but I was always taught that normal reads "always" outperform volatile reads. I won't bet 1 gazillion dollars on the answer. – ripper234 Nov 21 '11 at 14:24
  • Nope, some cpus may need a load-load barrier, yet it's still cheap. As a rule of the thumb consider volatile reads just normal loads. If any CPU needs to go to the main memory (aka cache miss), it's a horrifying design flaw. – bestsss Nov 21 '11 at 15:14
  • 1
    This answer is entirely incorrect and assumes that actual CPUs behave the way theoretical ones are imagined to. Yes, we imagine that the actual core has to fetch from the actual memory address to make sure it gets the current value, but real world CPUs don't actually do that. They have *much* better ways to achieve the same result, for example forcing writes from other CPUs to invalidate their cache lines rather than not caching. – David Schwartz Mar 14 '17 at 20:06
-2

volatile implies that the compiler cannot optimize the variable by placing its value in a CPU register. It must be accessed from main memory. It may, however, be placed in a CPU cache. The cache will guaranty consistency between any other CPUs/cores in the system. If the memory is mapped to IO, then things are a little more complicated. If it was designed as such, the hardware will prevent that address space from being cached and all accesses to that memory will go to the hardware. If there isn't such a design, the hardware designers may require extra CPU instructions to insure that the read/write goes through the caches, etc.

Typically, the 'volatile' keyword is only used for device drivers in operating systems.

drudru
  • 4,933
  • 1
  • 19
  • 19
  • 6
    That may be what volatile means in C, but it isn't what it means in Java. In Java, volatility is about whether one thread performing a read will "see" changes made by another thread. It's more than simply whether the value can be in a CPU register. The volatile keyword also prevents what kinds of reordering the JVM can do on code that uses the variable. – NamshubWriter Jul 07 '09 at 04:39
  • Here's a Dr Dobb's article that goes into a little more detail between the differences: http://www.ddj.com/hpc-high-performance-computing/212701484 – PH. Jul 07 '09 at 04:55
  • The reader didn't say java and was specifically mentioned 2nd level cache, so I assumed the most common scenario... C. – drudru Jul 07 '09 at 20:47
  • I don't know why this answer is down voted but volatile keyword in Java provides the same functionality. Basically the variable is not cached and written to and read from memory so the changes are visible to every thread. – Çelebi Murat Nov 16 '17 at 09:56