1

Depending on the source I've heard different things. Some say that volatile updates to shared cache, but most others have said it updates to ram(main memory) and is read from ram. Clearly if you are writing and reading from ram that will trash performance badly. Given modern processors have shared multicore caches, does writing variables only update local caches, and doesn't update the shared cache? Does volatile correct this issue at the cache level or as most say does it rely on main memory?

If variables are not updated at the shared cache level, is there a technical reason why this is so? It would seem that updating and reading from shared cache would have higher performance than updating and reading from main memory.

I'm trying to understand better multithread code, and performance implications. Haven't run into any issue.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • In order to understand volatile keyword see how it is used in C language. – Farrukh Nabiyev Apr 17 '23 at 13:27
  • For what `volatile` guarantees have a look at [C# documentation](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile) and [Java Memory Model](https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html). Notice that you assume certain implementation details which are stronger than either of these languages actually guarantee. An interesting point of view on that [Java code runs on Java Virtual Machine, a certain abstraction on underlying hardware](https://stackoverflow.com/a/68435498/7034621). Your assumptions work for physical hardware. – orhtej2 Apr 17 '23 at 13:44
  • @FarrukhNabiyev volatile in C is completely different than volatile in Java. Volatile in C signals to the compiler that it should emit the load/store as is and not try to optimize it. Java volatile is a lot stronger than that. – pveentjer Apr 18 '23 at 02:48
  • Caches on modern processors are always coherent. Main memory is just a spill bucket for whatever doesn't fit in the cache, and main memory doesn't need to be up to date with the cache at all. That would be extremely inefficient. E.g. if a single core would be updating an uncontended volatile variable, the cache line containing that variable will always be in the right state and doesn't need to leave the core. So updates can be done extremely fast without writing to main memory. Volatile doesn't lead to any flushing of caches to main memory; this is a very common misconception. – pveentjer Apr 18 '23 at 02:54

3 Answers3

0

Modern CPUs have fully coherent cache lines.

volatile is primarily a language feature for compiling ASM. Pretty much all that it does is force the write to go to a memory address as opposed to registers, it doesn't decide whether or where that memory address is cached. The actual write may be to a global or local cache line, or to main memory.

It may also introduce a write barrier to prevent re-ordering of instructions. It also forces the write to be atomic.

It does not do anything involving the cache lines. The CPU core manages its local cache lines, and ensures that they are consistent with other local cache lines, the global cache line, and main memory at all times. Consistent here does not mean that they are all the same. It means that at any point, a core will know whether a particular location is valid to be read or written to, and therefore where the source of truth is for a particular memory address is.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • Volatile is a lot more than just a fence at the hardware level. It also ensures that (1) the compiler will not optimize out a load or store (visibility), (2) the compiler/CPU will guarantee certain ordering, and (3) that the load/store is atomic; so no read or write tearing. So even if you would run on a hypothetical processor that is sequentially consistent, volatile is still needed to keep the compiler under control. – pveentjer Apr 18 '23 at 02:31
  • The write will always be done to a local cache line because before the write can be performed, the cache line needs to put in the right place (M/E in case of MESI) in the local cache of that core. Memory doesn't need to be consistent with the cache at all; that would be extremely slow. The cache is the source of truth. The cache coherence protocol will ensure coherence between cores but not with main memory. – pveentjer Apr 18 '23 at 02:35
  • It is guaranteed that a write to a cache line will not be lost. So if there is a dirty cache line and the cache line needs to be evicted from the cache, it will end up in main memory. One interesting edge case is the MESI protocol; imagine that cache line is in modified (M) state in 1 core and a different core wants to read it that cache line, then the cache line will be written to main memory because MESI doesn't support dirty sharing. MOESI solves that problem. – pveentjer Apr 18 '23 at 03:02
  • 1. I mentioned that on the first line. 2. I mentioned that on the 2nd line. True it does also ensure atomicity. I mentioned at the end what I meant by consistency in general terms: that the CPU always knows which location to use (sometimes its own local, sometimes main memory, sometimes forcing another core's cache flush). Yes a local cache line cannot be used by another CPU (it's local after all). I wasn't intending it to be a giant writeup on MESI, you are welcome to write your own answer. – Charlieface Apr 18 '23 at 09:01
  • It seems you have edited your answer. There are still bits of information that are incorrect or at least questionable. "All that volatile does is force the write to go to memory as opposed to registers. ". I'm not sure how to interpret this, but if you are referring to main memory, then this is incorrect. Volatile doesn't force any 'flushing' of cache lines to main memory. What can happen on the X86 is that volatile inserts a memoryfence which stalls the execution of loads till the store buffer is drained to the coherent cache to ensure that older stores do not get reordered with newer loads. – pveentjer Apr 18 '23 at 11:21
  • Sometimes this incorrectly called 'flushing the store buffer' because the store buffer already is being drained to the coherent cache as fast as possible. – pveentjer Apr 18 '23 at 11:26
  • No I didn't mean that, I just meant a memory address. Reworded it now to make it clearer. – Charlieface Apr 18 '23 at 11:27
  • It is the compiler that optimizes the code so that reads or writes are not immediately visible. For example, if there is a for loop and the same plain field of an object is read on each iteration, the compiler could hoist that read out of the loop and this is what needs to be controlled if that field was volatile. The CPU is not magically going to transform the code to convert 'register access' to 'memory accesses'. – pveentjer Apr 18 '23 at 11:33
  • Yes I know that. I'm not writing a whole article in a [so] answer. You on the other hand, seem to want to do so in comments. – Charlieface Apr 18 '23 at 11:48
0

This just isn't how java-the-language works. java-the-language has an official spec and that spec isn't written with a specific processor in mind.

The controlling document is the Java Memory Model.

The JMM works in terms of guarantees. In general, the JMM works as follows:

  • If you do certain things, then the JVM guarantees these reliabilities.
  • Otherwise... anything can happen. The spec simply does not explain or make any promises whatsoever.

And these things are intentionally kept very abstract. A JVM implementation therefore just needs to ensure that first bullet is guaranteed and can do whatever it wants otherwise. Hence, something like volatile simply guarantees that it is no longer possible to observe the update from one thread and not from another, and how a JVM ensures this guarantee is up to the JVM.

This is why java can be efficient.

Here's a simple example:

The JMM guarantees the following:

There is an enumerated list of 'Happens Before' causes, each entry is highly specced. Any 2 lines (technically, any 2 bytecode instructions) either have an HB relationship or not, and you can know for sure, because it's specced. Let's say we have Bytecode1 and Bytecode2, and bc1 has a Happens-Before relationship with bc2.

A JVM guarantees that it is impossible to observe state as it was before BC1 finished execution when BC2 executes, in that context.

And that is all that a JVM guarantees.

For example, someThread.start() is specced to be HB relative to the first line in the run() method of that thread. Hence, given:

class Test {
  static long v = 10;
  
  void test() {
    v = -1;
    new Thread(() -> {
      System.out.println(v);
    }).start();
  }
}

The above code Must print -1. If it prints 20, that JVM is buggy; if you file a bug about it, it would be accepted. That's because v = -1; is happens-before relative to .start(), and thus, being able to observe state as it was before v = -1; is a thing the JVM spec guarantees cannot happen.

In contrast:

class Test {
  static long v = 10;

  void test() {
    new Thread(() -> {
      Thread.sleep(1000);
      System.out.println(v);
    }).start();
    v = -1;
  }
}

Here the JVM spec says that a JVM that prints '10' is legal. A JVM that prints '-1' is also legal. A JVM that flips a coin to determine this, is legal. A JVM that decides to print -1 every day and in every unit test today even if you run it 10000 times, and then juuuust as you give a demo to that important customer it starts returning 10... that'd be legal.

In fact, the JVM spec even says that getting a split read is legal (where the first 32 bits are all 1-bits due to -1 being all 1 bits, and the latter 32 bits are still 20, resulting in some large negative number). This is an interesting case: Pretty much no JVM you could possibly run today can be coerced to actually produce that split read, because who still has full 32-bit outlays, and even then, often you simply can't catch the other thread 'in between'.

Nevertheless, if you write code that would break if a split read condition occurs, that code is buggy. If a tree falls in the forest... - if you program a bug that cannot possibly be triggered anymore, is it a bug? I dunno, ask a philosopher :)

File bugs all day about this and it'll just be denied: The JVM is working fine, because it adheres to the spec.

Even if you expand the sleep to '14 days' and just let the JVM run that long, it could print '10', even though that seems nuts (it got set to -1 14 days ago!), but there are JVMs out there that really will do that.

volatile is a bit more convoluted in exactly which guarantees you get, and it also fits in this Happens-Before framework (essentially, any 2 bytecode instructions that interact with a volatile field, at least one of which is a write instruction, establish HB but can be a little tricky to know which one is the Happens-Before).

HOW does the JVM implement this? That's up to the JVM. If there is an efficient way to provide that guarantee, it'll do the efficient thing.

This is exactly why the JVM is specced so abstractly. Imagine the spec literally spelled out: volatile ensures that any write to the field will be immediately flushed out to main RAM.

Then if there was a much faster way to ensure inter-thread consistency by e.g. only flushing pages from one core directly to another (imagine a processor exists that could do such a thing), or there's a cache level shared by all cores and you can flush only to there - then a JVM could not use those functionalities because some well-meaning moron decided to get overly specific about some explicit CPU in the Java Language Specification document.

For a breakdown of how most JVMs currently implement the requirement to guarantee certain behaviours when volatile is used, see @Charlieface's answer. But realize that that answer is not actually what the java lang spec guarantees in any way; the JLS doesn't even mention the word 'cache line'.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • 1
    Interesting, explanations about volatile should also be abstract if that is the case. It would seem that saying it goes to main memory might not necessarily be correct if this is the case, it could happen but not in all cases. – omar santaella Apr 17 '23 at 13:55
  • Indeed, that is an oversimplified explanation, with the keyword there being _over_ - that is going too far in trying to explain what `volatile` does in a succint fashion, and in so doing, is effectively now just straight up incorrect. – rzwitserloot Apr 17 '23 at 14:10
  • I've read that when using synchronized using volatile may not be necessary. That makes me wonder if synchronized also makes variables within the synchronized code block volatile under the hood. – omar santaella Apr 17 '23 at 14:12
  • _That makes me wonder if synchronized also makes variables within the synchronized code block volatile under the hood._ - that is __nonsensical__. `volatile` means 'some abstract guarantee about observability', it has no link whatsoever to CPU design. Thus, your musing makes no sense. – rzwitserloot Apr 17 '23 at 14:13
  • That's the central point here: There's "What the JVM specs tell me I can rely on", and that is what you as java coder should worry about. There really isn't "... and this is how it runs on a CPU" because that is useless: Could be no longer true next JVM release, likely isn't true on a different CPU architecture, and java runs on many, many platforms. – rzwitserloot Apr 17 '23 at 14:14
  • There is an obvious HB with `synchronized`: The exit of a `sync(x){}` block is HB relative to the entry of a `sync(x){}` block (where `x` is pointing at the same object) that runs later. It's just those 2 threads with HB - __every__ field write to __every__ field in the former sync block is now guaranteed visible from the thread that enters it later. Contrast to `volatile` which is 'global' (for all threads). The two ideas aren't related spec-wise. Whether the JVM implements them using the same CPU lock or flush mechanisms.. who knows? who cares? – rzwitserloot Apr 17 '23 at 14:16
  • If I read your comment correctly it indicated that a separate thread changing the value of a variable could go unnoticed by another thread. e.g. the 10 and 20 values. And volatile fixes this issue, enforcing threads to have the latest updated value. Clearly synchronized code blocks must do something similar under the hood to enforce variable values are updated correctly for all threads just like volatile, or we'd need to use volatile in addition to synchronized. – omar santaella Apr 17 '23 at 14:17
  • And with 'guaranteed visible' I really mean: "it is not possible to observe anything else". A JVM is free not to sync field Z if it realizes that the second sync block doesn't read field Z. As long as the code in a happens-after can't observe state as it was _before_ its HB, the JVM has fulfilled its spec. And JVMs really are this smart at times. See how utterly pointless it is to try to define these things in terms of cache lines and RAM flushes? – rzwitserloot Apr 17 '23 at 14:17
0

The Java Language Specification (JLS) says almost nothing about "main memory" or "cache." It describes the behavior of volatile variables strictly in terms of how it restricts the ordering of events in different threads. The JLS description is kind of dry and academic. (search for "volatile" wherever it appears in chapter 17.) But mostly, what it tells you is; Whatever thread A does before it writes to some volatile field f must become visible to another thread B by the time thread B reads the same field f and gets the value that A wrote.

Solomon Slow
  • 25,130
  • 5
  • 37
  • 57
  • It is more than just ordering. Visibility and atomicity are also key guarantees. And be careful with 'time' because it could lead to the wrong impression. The JMM is expressed in terms of sequential consistency (SC) and doesn't rely on physical time. Under SC, it is fine if reads and writes are skewed as long as the program order isn't violated. Linearizability = SC + preserving physical time. – pveentjer Apr 18 '23 at 02:39