15

I'm trying to understand Java's volatile keyword with respect to writing to a volatile atomic variable in a multithreaded program with CPU caches.

I've read several tutorials and the Java Language Specification, particularly section 17.4.5 on "happens-before ordering". My understanding is that when a thread writes a new value to a volatile variable, the updated value must be visible to other threads reading that variable. To me, these semantics can be implemented in one of two ways:

  1. Threads can cache a volatile variable in the CPU cache, but a write to the variable in the cache must be flushed immediately to main memory. In other words, the cache is write-through.

  2. Threads can never cache a volatile variable and must read and write such variables in main memory.

Approach 1 is mentioned in this tutorial (http://tutorials.jenkov.com) that says:

By declaring the counter variable volatile all writes to the counter variable will be written back to main memory immediately.

Approach 2 is mentioned in a Stackoverflow question "Volatile variable in Java" and also this tutorial that says:

The value of this variable will never be cached thread-locally: all reads and writes will go straight to "main memory"

Which one is the correct approach used in Java?

Related Stackoverflow questions which do not answer my question:

Volatile variable in Java

Does Java volatile read flush writes, and does volatile write update reads

Java volatile and cache coherence

Community
  • 1
  • 1
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
  • 2
    Neither. It forces a memory fence at the point of assignment, meaning the write to the volatile variable *and all previous writes* are made visible to other threads, if those threads read the same volatile variable first. No effect is guaranteed if no reads of the same volatile occur. – markspace Apr 26 '16 at 02:42
  • The SO answer you quote in "approach 2" appears to be correct: ""Volatile variable in Java" looks right (dinners on and I only gave it a quick read however). The tutorial you link to in "approach 2" appears to be crap. – markspace Apr 26 '16 at 02:49

2 Answers2

11

The guarantees are only what you see in the language specification. In theory, writing a volatile variable might force a cache flush to main memory, or it might not, perhaps with a subsequent read forcing the cache flush or somehow causing transfer of the data between caches without a cache flush. This vagueness is deliberate, as it permits potential future optimizations that might not be possible if the mechanics of volatile variables were spelled out in more detail.

In practice, with current hardware, it probably means that, absent a coherent cache, writing a volatile variable forces a cache flush to main memory. With a coherent cache, of course, such a flush isn't needed.

Warren Dew
  • 8,790
  • 3
  • 30
  • 44
  • 1
    Thanks. So this answer is wrong, right? http://stackoverflow.com/a/6259755/4561314 "Declaring a volatile Java variable means: The value of this variable will never be cached thread-locally: all reads and writes will go straight to 'main memory'." I think you answer is saying that volatile variables can be cached. – stackoverflowuser2010 Apr 26 '16 at 03:52
  • I think that answer has some problems with it, yes. – Warren Dew Apr 26 '16 at 04:11
  • The CPU caches everything in thread-local memory. Volatile simply ensures that the field is firstly, written back to shared memory, and secondly, that other threads wait for this (modern CPUs, aka cache coherent CPUs, already ensure that everything is written back to shared memory, but they don't automatically make other threads wait for this to happen). – KookieMonster Apr 26 '16 at 04:26
  • 3
    Hmm, I have problems with the "written back to main memory" part of that, @KookieMonster . I think Warren is saying that in a cache coherent architecture, volatiles may not literally be written to main memory (and I agree). – markspace Apr 26 '16 at 04:28
  • The java specification is very clear that any changes to volatile fields will become visible to all threads, which will always see the most recent change to that field. This is simply not possible without writing back to shared memory AND using memory barriers to synchronize the threads. Warren is simply pointing out that cache coherent architectures already ensure that data is written back for all fields (they just don't ensure the synchronization); https://docs.oracle.com/javase/tutorial/essential/concurrency/atomic.html – KookieMonster Apr 26 '16 at 04:41
  • 3
    Actually, I had the same question, so I asked Intel. [According to them](https://software.intel.com/en-us/comment/1835928), "Coherence traffic is carried over the QPI links [which] uses a modest fraction of the QPI bandwidth." So no main memory isn't actually always involved. [QPI is described on Wikipedia, fyi.](https://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect) – markspace Apr 26 '16 at 05:31
  • QPI is an interconnect, used to connect chips together, and to main memory. You can't even transfer data over QPI before writing back from thread-local memory [into shared memory, L2 and higher. In terms of coherence, QPI is used to directly transfer between LLC when you have multiple chips on one mobo, rather than writing back to RAM DIMMS]. Just read your own citation. – KookieMonster Apr 26 '16 at 07:28
  • 2
    @KookieMonster You see that means that you are wrong, right? Main memory need not be involved and modern CPUs would be *much* slower if it was since main memory is usually about an order of magnitude slower than the interconnects between cores. – David Schwartz Jan 09 '19 at 00:50
  • QPI is an external interconnect. It's designed for multi-socket configurations, where you have to move data between, not just cores, but also entire processors. With QPI, coherence traffic can be sent directly from cache to cache (between L3 caches in modern Intel CPUs). Without QPI, the only shared memory for multi-socket configurations; would be main memory. As you say: this would be very painful, and interconnects like QPI, NVLink, InfinityFabric, etc, were created especially to address this problem. – KookieMonster Jan 14 '19 at 02:09
  • Also just to make sure we're on the same page; main and shared memory are very different concepts in computer science (main memory being the highest level of physical memory, shared memory being an abstraction for 'any memory shared between programs/cores/etc' [not necessitating the involvement of main memory in any relevant capacity]), I feel like this distinction may have been overlooked by some of the previous responses, leading to some of the confusion in this thread. – KookieMonster Jan 14 '19 at 02:39
  • 1
    Caches are not 'flushed'. The main thing that happens in case of a volatile write is that the CPU stops executing loads till the store buffer has been drained. If the cacheline containing the write isn't in exclusive or modified state a request for ownership is done which will invalidate the cacheline on any other CPU and once it is granted, the store from the store buffer can commit to the L1D. There is no flushing of any caches. – pveentjer May 19 '20 at 12:23
7

Within Java, it's most accurate to say that all threads will see the most recent write to a volatile field, along with any writes which preceded that volatile read/write.

Within the Java abstraction, this is functionally equivalent to the volatile fields being read/written from shared memory (but this isn't strictly accurate at a lower level).


At a much lower level than is relevant to Java; in modern hardware, any and all reads/writes to any and all memory addresses always occur in L1 and registers first. That being said, Java is designed to hide this kind of low level behavior from the programmer, so this is only conceptually relevant to the discussion.

When we use the volatile keyword on a field in Java, this simply tells the compiler to insert something known as a memory barrier on the reads/writes to this field. A memory barrier effectively ensures two things;

  1. Any threads reading this address will use the most up-to-date value (the barrier makes them wait until the most recent write makes it back to shared memory, and no reading threads can continue until this updated value makes it to their L1 cache).

  2. No reads/writes to ANY fields can cross over the barrier (aka, they are always written back before the other thread can continue, and the compiler/OOO cannot move them to a point after the barrier).

To give a simple Java example;

//on one thread
counter += 1; //normal int field
flag = true; //flag is volatile

//on another thread
if (flag) foo(counter); //will see the incremented value

Essentially, when setting flag to true, we create a memory barrier. When Thread #2 tries to read this field, it runs into our barrier and waits for the new value to arrive. At the same time, the CPU ensures that counter += 1 is written back before that new value arrives. As a result, if flag == true then counter will have been incremented.


So to sum up;

  1. All threads see the most up-to-date values of volatile fields (which can be loosely described as "reads/writes go through shared memory").

  2. Reads/writes to volatile fields establish happens-before relationships with previous reads/writes to any fields on one thread.

KookieMonster
  • 505
  • 2
  • 6