Visibility of volatile writes in C#

Question

According to section 14.5.4 of the C# language spec (ECMA 334, 6th Edition), volatile fields are all about preventing reordering:

14.5.4 Volatile fields

When a field_declaration includes a volatile modifier, the fields introduced by that declaration are volatile fields. For non-volatile fields, optimization techniques that reorder instructions can lead to unexpected and unpredictable results in multi-threaded programs that access fields without synchronization such as that provided by the lock_statement (§12.13). These optimizations can be performed by the compiler, by the run-time system, or by hardware. For volatile fields, such reordering optimizations are restricted:

• A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.

• A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence.

This is in contrast with the Java memory model, which also provides visibility guarantees (link):

A field may be declared volatile, in which case the Java Memory Model ensures that all threads see a consistent value for the variable

In the same section, the C# spec also contains the snippet below which shows how a volatile write is used to ensure that the main thread correctly prints result = 143:

class Test
{
    public static int result;
    public static volatile bool finished;
    
    static void Thread2()
    {
        result = 143;
        finished = true;
    }

    static void Main()
    {
        finished = false;
        new Thread(new ThreadStart(Thread2)).Start();
        for (;;)
        {
            if (finished)
            {
                Console.WriteLine($"result = {result}");
                return;
            }
        }
    }
}

However, there's no mention of what guarantees the (eventual?) visibility of the write to finished. Is this just implicit, or is it covered elsewhere in the spec?

The snippet above is trivial to the point of meaningless. There is a better example on MSDN: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile. However, that page gives no further information explaining how it actually works. I always thought it was just a matter of managing compiler/runtime/hardware optimization settings for particular fields, but the excerpt from the spec suggests that there is a lot more going on. — dmedine, Jun 28 '23 at 03:39
You might want to take a look at a [relevant GitHub issue](https://github.com/dotnet/runtime/issues/67330 "Confusion regarding Volatile.Read guarantees, and example in the volatile docs"). — Theodor Zoulias, Jun 28 '23 at 05:28
Does this answer your question? [When should the volatile keyword be used in C#?](https://stackoverflow.com/questions/72275/when-should-the-volatile-keyword-be-used-in-c) — shingo, Jun 28 '23 at 05:46
@TheodorZoulias That github issue is great! Thank you for sharing and for opening it! So is the answers simply that the visibility guarantee comes not from the language but from the CPU having a coherent cache? In other words, the language just needs to ensure the ordering and that it doesn't optimize away writes to such variables. — Malt, Jun 29 '23 at 02:05
Malt to be honest after quite a lot of reading this stuff still makes my head dizzy. Not being able to learn by experimentation, because my hardware's memory model is stronger than the C# specification, doesn't make things any easier. — Theodor Zoulias, Jun 29 '23 at 02:30
@TheodorZoulias Understandable, this stuff is genuinely hard. Still, consider writing an answer based the insight from that issue. Or, if you prefer, I can write one. — Malt, Jun 29 '23 at 02:35
I don't think that I am qualified to answer the question because it includes a comparison with Java's memory model, for which I have no idea. :-) — Theodor Zoulias, Jun 29 '23 at 02:38

Malt · Answer 1 · 2023-07-10T23:34:57.103

2

We need to make a distinction between writes and assignments.

Visibility of writes is a property of CPUs, not languages. Specifically, it's a question of Cache Coherence which has two requirements:

Write Propagation - Changes to the data in any cache must be propagated to other copies (of that cache line) in the peer caches.

Transaction Serialization - Reads/Writes to a single memory location must be seen by all processors in the same order.

If a CPU's caches are coherent (which they should be), writes will eventually be seen by all threads in some fixed order.

However, at the language level we are (loosely speaking) talking about assignments, not writes. So, on CPUs with coherent caches the question becomes whether C# performs a memory write on volatile assignments, or can they be optimized away by the compiler.

Although such optimizations aren't explicitly prohibited in ECMA 334, the spec does contains guarantees that would break if volatile assignments could be compiled into something other than a memory write:

From section 14.5.4 ("Volatile Fields"):

These restrictions ensure that all threads will observe volatile writes performed by any other thread in the order in which they were performed.

From section 7.10 ("Execution Order"):

Execution of a C# program proceeds such that the side effects of each executing thread are preserved at critical execution points... The critical execution points at which the order of these side effects shall be preserved are references to volatile fields...

As for Java, my educated guess is that the JLS authors felt the need to specify the visibility guarantee (which importantly doesn't mention when the write will be visible) because the spec is written against not actual hardware but instead, "The Java Virtual Machine" which is "an abstract computing machine" (quote from section 1.2). Therefore, this property of the JVM had to be mentioned.

Big thanks to Theodor Zoulias who shared an excellent GitHub issue about this exact question and to Peter Cordes who chimed on both on GitHub and here (see the comments as well as this StackOverflow answer).

edited Jul 10 '23 at 23:34

answered Jun 29 '23 at 03:50

Malt

28,965
9
65
105

1

Visibility of asm stores is a CPU property, but visibility of assignments in high-level languages *is* a language property. (Which HW cache coherency makes easy to implement.) For example, the ISO C++ standard is explicit that assignments to `atomic<>` objects must be visible to other threads in "finite" time, and that "Implementations should make atomic stores visible to atomic loads within a reasonable amount of time." (See my comment on the github issue for links and the full quotes.) – Peter Cordes Jun 29 '23 at 16:02
1

C# is the odd language, with a threading model that AFAIK isn't as formally specified, since it's designed around real hardware with a bias towards x86 (Interlocked.exchange and so on, and a `volatile` that gives you essentially x86's memory-ordering semantics of acquire/release). In the bad old days of C++ before C++11, all we had was C++ `volatile` which just guarantees that a load or store will happen in the asm, but no ordering wrt. surrounding non-atomic ops (except on older MSVC or with `/volatile:ms`), so we needed inline asm to get memory-ordering. – Peter Cordes Jun 29 '23 at 16:03
@PeterCordes So if I understand correctly, it is a language property in the sense that the language need to ensure that assignments actually translate to memory writes, which in turn should translate to visibility from other threads on a cpu with coherent caches. – Malt Jun 29 '23 at 19:49
That's how C# standardizes it, I think, in terms of volatile assignments compiling to stores or loads, with some ordering constraints. Other languages (like C++) choose to define a formal memory model with rules that require eventual visibility. They don't care exactly how implementations go about making that happen, but every "normal" C++ implementation does so the same way as C#, by compiling to asm for a machine with coherent caches (for the cores that `std::thread` runs across; systems like ARM microcontroller + DSP exist with shared non-coherent caches, but they'd run separate programs) – Peter Cordes Jun 29 '23 at 20:09
It's also theoretically possible for a C++ implementation to do manual flushes after every atomic store to ensure visibility, if running on a machine without coherent cache. But the happens-before rule that every earlier store in one thread has to be visible to every later load in another thread would I think make that very expensive to implement; C++ source doesn't explicitly tell the compiler which stores were actually important to make visible, so it would have to flush everything before every std::atomic load or store. ([See this Q&A](https://stackoverflow.com/a/58535118)). – Peter Cordes Jun 29 '23 at 20:14
The C++ memory model is designed for systems with coherent cache, but doesn't explicitly require it. The closest it comes is the read-read / read-write / write-read / write-write coherency rules, and a note mentioning that this is what happens naturally on hardware with coherent cache. http://eel.is/c++draft/intro.races#19 – Peter Cordes Jun 29 '23 at 20:14
@PeterCordes I edited the question to incorporate some of your feedback. I'd appreciate it if you could check my understanding. – Malt Jun 30 '23 at 02:05
Good edit. I'm not sure "assignments" is the best term for high-level abstract-machine writes, but it seems good enough. There are RMWs like `++` operators, and ops like Interlocked.Exchange. But yeah, if we want to make this distinction, it seems good. Just be aware that it's not wrong to say "writes" when talking about high-level language stuff; I just picked "assignments" to emphasize the distinction I wanted to point out. I don't think this is standard technical terminology in C# or C++ or compiler creation, but should be understood by anyone reading your answer since you introduce it. – Peter Cordes Jun 30 '23 at 02:18
*If a CPU's caches are coherent (which they should be), writes will eventually be seen by all threads in some fixed order.* - That also depends on coherent cache being the *only* way for data to get between CPU cores. It's not in POWER CPUs, making IRIW reordering possible where two readers disagree about the order of two independent writes: [Will two atomic writes to different locations in different threads always be seen in the same order by other threads?](//stackoverflow.com/q/27807118) . In theory, I think an interconnect could allow reordering some reordering, but they're design not to – Peter Cordes Jun 30 '23 at 02:22
@PeterCordes Thanks. So there are no dedicated assembly instructions on x86 for flushing the cache/doing a write-through? Which real-world CPU architectures have those? – Malt Jun 30 '23 at 13:26
Much later, way after multi-core CPUs were a thing, `clflush` and `clwb` were added for other reasons. `clwb` was added for persistent (non-volatile) memory, like Optane DC PM or other NV-DIMMs. `clfush` was somewhat earlier, but might have also been added because of Intel's then-upcoming plans for memory-mapped persistent storage. ARM also has cache-control instructions. I think it's useful after JITing some machine code into a buffer; you might need an explicit flush as well as barriers to wait for that flush, and block instruction fetch until after flushing to the point of unification. – Peter Cordes Jun 30 '23 at 14:12

Visibility of volatile writes in C#

1 Answers1

Linked