Volatile fence demo?

Question

Im trying to see how the fence is applied.

I have this code (which Blocks indefinitely):

static void Main()
{
    bool complete = false;
    var t = new Thread(() => {
        bool toggle = false;
        while(!complete) toggle = !toggle;
    });
    t.Start();
    Thread.Sleep(1000);
    complete = true;
    t.Join(); // Blocks indefinitely
}

Writing volatile bool _complete; solve the issue .

Acquire fence :

An acquire-fence prevents other reads/writes from being moved before the fence;

But if I illustrate it using an arrow ↓ ( Think of the arrowhead as pushing everything away.)

so now - the code can look like :

 var t = new Thread(() => {
            bool toggle = false;
            while( !complete ) 
                    ↓↓↓↓↓↓↓     // instructions can't go up before this fence.  
               {
                 toggle = !toggle;
                }
        });

I don't understand how the illustrated drawing represent a solution for solving this issue.

I do know that while(!complete) now reads the real value. but how is it related to complete = true; location to the fence ?

Eric Lippert · Accepted Answer · 2013-03-01T19:16:51.317

Making complete volatile does two things:

It prevents the C# compiler or the jitter from making optimizations that would cache the value of complete.
It introduces a fence that tells the processor that caching optimizations of other reads and writes that involve either pre-fetching reads or delaying writes need to be de-optimized to ensure consistency.

Let's consider the first. The jitter is perfectly within its rights to see that the body of the loop:

    while(!complete) toggle = !toggle;

does not modify complete and therefore whatever value complete has at the beginning of the loop is the value that it is going to have forever. So the jitter is allowed to generate code as though you'd written

    if (!complete) while(true) toggle = !toggle;

or, more likely:

    bool local = complete; 
    while(local) toggle = !toggle;

Making complete volatile prevents both optimizations.

But what you are looking for is the second effect of volatile. Suppose your two threads are running on different processors. Each has its own processor cache, which is a copy of main memory. Let's suppose that both processors have made a copy of main memory in which complete is false. When one processor's cache sets complete to true, if complete is not volatile then the "toggling" processor is not required to notice that fact; it has its own cache in which complete is still false and it would be expensive to go back to main memory every time.

Marking complete as volatile eliminates this optimization. How it eliminates it is an implementation detail of the processor. Perhaps on every volatile write the write gets written to main memory and every other processor discards their cache. Or perhaps there is some other strategy. How the processors choose to make it happen is up to the manufacturer.

The point is that any time you make a field volatile and then read or write it, you are massively disrupting the ability of the compiler, the jitter and the processor to optimize your code. Try to not use volatile fields in the first place; use higher-level constructs, and don't share memory between threads.

I'm trying to visualize the sentence :"An acquire-fence prevents other reads/writes from being moved before the fence..." What instruction should not be before that fence ?

Thinking about instructions is probably counterproductive. Rather than thinking about a bunch of instructions just concentrate on the sequence of reads and writes. Everything else is irrelevant.

Suppose you have a block of memory, and part of it is copied to two caches. For performance reasons, you read and write mostly to the caches. Every now and then you re-synchronize the caches with main memory. What effect does this have on a sequence of reads and writes?

Suppose we want this to happen to a single integer variable:

Processor Alpha writes 0 to main memory.
Processor Bravo reads 0 from main memory.
Processor Bravo writes 1 to main memory.
Processor Alpha reads 1 from main memory.

Suppose what really happens is this:

Processor Alpha writes 0 to the cache, and synchronizes to main memory.
Processor Bravo synchronizes cache from main memory and reads 0.
Processor Bravo writes 1 to cache and synchronizes the cache to main memory.
Processor Alpha reads 0 -- a stale value -- from its cache.

How is what really happened in any way different from this?

Processor Alpha writes 0 to main memory.
Processor Bravo reads 0 from main memory.
Processor Alpha reads 0 from main memory.
Processor Bravo writes 1 to main memory.

It isn't different. Caching turns "write read write read" into "write read read write". It moves one of the reads backwards in time, and, in this case equivalently, moves one of the writes forwards in time.

This example just involves two reads and two writes to one location, but you can imagine a scenario where there are many reads and many writes to many locations. The processor has wide lattitude to move reads backwards in time and move writes forwards in time. The precise rules for what moves are legal and which are not differ from processor to processor.

A fence is a barrier that prevents reads from moving backwards or writes from moving forwards past it. So if we had:

Processor Alpha writes 0 to main memory.
Processor Bravo reads 0 from main memory.
Processor Bravo writes 1 to main memory. FENCE HERE.
Processor Alpha reads 1 from main memory.

No matter what caching strategy a processor uses, it is now not allowed to move read 4 to any point before the fence. Similarly it is not allowed to move write 3 ahead in time to any point after the fence. How a processor implements a fence is up to it.

Isn't it the same optimization which controlled by the optimize flag ? (thank you very much for answering.). — Royi Namir, Mar 01 '13 at 17:58
The C# compiler and the jitter get less aggressive with optimization off, but the *processor* knows nothing about that. Fences are about disabling processor optimizations. It is *the chip itself* that is making the dangerous optimization that needs to be turned off. — Eric Lippert, Mar 01 '13 at 18:11
But I'm trying to _visualize_ the sentence :"_An acquire-fence prevents other reads/writes from being moved before the fence..._" **What** instruction should _not_ be before that fence ? . ( I did understand the CPU "toggling" part. but again , I'm trying to understand the visualization. — Royi Namir, Mar 01 '13 at 18:34
@RoyiNamir: You're welcome. This is a very, very brief introduction to memory fences. They get much more complicated than this. This article by Vance Morrison is a much more in-depth look at these issues and how they relate to C#. http://msdn.microsoft.com/en-us/magazine/cc163715.aspx — Eric Lippert, Mar 01 '13 at 19:22
Eric I think there's a problem. I'll tell you why.we know that **acquire-fence**: is a memory barrier in which other reads and writes are not allowed to move **before** the fence. and **release-fence** is A memory barrier in which other reads and writes are not allowed to move **after** the fence. SO the fence in your last sample will not be where you said it is. according to http://stackoverflow.com/a/10637264/859154 you last sample will look like `1...2... ↑write ↓Read` and those 2 CAN be swapped(http://stackoverflow.com/q/10631629/859154) which is one of volatile's problems.(continuing) — Royi Namir, Mar 01 '13 at 21:03
You also said _"But what you are looking for is the second effect of volatile"_... But since I showed you that `3` and `4` can be swap even if they are volatile (because it is the rare case where read after write [http://stackoverflow.com/q/10631629/859154]) , I think my issue is the first section of volatile like you stated : _It prevents the C# compiler or the jitter from making optimizations..._ and so I did make a test running the code with `optimize flag=OFF` (_without_ volatile) and it **didn't** hang. so I think the fences are not the issue here but JIT optimization. am I wrong ? — Royi Namir, Mar 01 '13 at 21:06
"It didn't hang" tells you nothing. First off, if you used x86 hardware *all memory writes are volatile*. x86 presents a strong memory model. If you want to demonstrate memory fence problems then you need to obtain some *weak memory model hardware*, like an ARM chip. And even on weak memory model chips, demonstrating problems requires just the right sequence of hardware timing, cache misses, and so on. There are some fence problems that are so rare you'd never see them in a decade of trying. — Eric Lippert, Mar 01 '13 at 21:43
(sorry , I want to be sure)... Does adding the volatile keyword solve **my** problem because of the jitter optimizations prevention **or because** of the fences ? (because , like I've said, if I run it _without_ the optimize flag - the program ends [ _becuase there's no code optimization like in your explanation_ ]. But if I run it _with_ the optimize flag - the program never ends) — Royi Namir, Mar 02 '13 at 13:47
@RoyiNamir: Every layer in the application is free to optimize (compiler, jitter, or hardware). When you use `volatile` or any memory barrier generator you are telling *all* layers to constrain instruction moving optimizations. So to answer your question...YES, the `volatile` keyword tells the jitter to prevent the "lifting" optimization. The acquire-fence on the reading thread *is* the compiler's notification to stop optimizations! — Brian Gideon, Mar 03 '13 at 03:33

Brian Gideon · Answer 2 · 2013-03-03T20:43:14.237

Like most of my answers pertaining to memory barriers I will use an arrow notation where ↓ represents an acquire-fence (volatile read) and ↑ represents a release-fence (volatile write). Remember, no other read or write can move past an arrow head (though they can move past the tail).

Let us first analyze the writing thread. I will assume that complete is declared as volatile¹. Thread.Start, Thread.Sleep, and Thread.Join will generate full fences and that is why I have up and down arrows on either side of each of those calls.

↑                   // full fence from Thread.Start
t.Start();
↓                   // full fence from Thread.Start
↑                   // full fence from Thread.Sleep
Thread.Sleep(1000);
↓                   // full fence from Thread.Sleep
↑                   // release fence from volatile write to complete
complete = true;
↑                   // full fence from Thread.Join
t.Join();
↓                   // full fence from Thread.Join

One important thing to notice here is that it is the Thread.Join call that is preventing the write to complete from floating any further down. The effect here is that the write gets committed to main memory immediately. It is not the volatility of complete itself that is causing it to get flushed to main memory. It is the Thread.Join call and the memory barrier it generates that is causing that behavior.

Now we will analyze the reading thread. This is a bit trickier to visualize because of the while loop though, but let us start with this.

bool toggle = false;
register1 = complete;
↓                           // half fence from volatile read
while (!register1)
{
  bool register2 = toggle;
  register2 = !register2;
  toggle = register2;
  register1 = complete;
  ↓                         // half fence from volatile read
}

Maybe we can visualize it better if we unwind the loop. For brevity I will only show the first 4 iterations.

if (!register1) return;
register2 = toggle;
register2 = !register2;
toggle = register2;
register1 = complete;
↓
if (!register1) return;
register2 = toggle;
register2 = !register2;
toggle = register2;
register1 = complete;
↓
if (!register1) return;
register2 = toggle;
register2 = !register2;
toggle = register2;
register1 = complete;
↓
if (!register1) return;
register2 = toggle;
register2 = !register2;
toggle = register2;
register1 = complete;
↓

Now that we have the loop unwound I think you can see how that any potential movement of the read of complete is going to be severely limited.² Yes, it can get shuffled around a little bit by the compiler or hardware, but it is pretty much locked into being read on every iteration. Remember, the read of complete is still free to move, but the fence that it created does not move with it. That fence is locked into place. This is what causes the behavior often called a "fresh read". If volatile were omitted on complete then the compiler would be free to use an optimization technique called "lifting". That is where a read of a memory address can get extracted or lifted outside the loop. In the absence of volatile that optimization would be legal because all of the reads of complete would be allowed to float up (or lifted) until they are all ultimately outside of the loop. At that point the compiler would then coalesce them all into a one-time read just before starting the loop.³

Let me summarize a few important points right now.

It is the call to Thread.Join that is causing the write to complete to get committed to main memory so that the worker thread will eventually pick it up. The volatility of complete is irrelevant on the writing thread (which is probably surprising to most).
It is the acquire-fence generated by the volatile read of complete that is preventing that read from getting lifted outside of the loop which in turn creates the "fresh read" behavior. The volatility of complete on the reading thread makes a huge difference (which is probably obvious to most).
"Committed writes" and "fresh reads" are not directly caused volatile reads and writes. But, they are indirect consequences which just happen to almost always occur especially in the case of loops.

¹Marking complete as volatile on the writing thread is not necessary because x86 writes already have volatile semantics, but more importantly because the fence that is created by it does not cause the "committed write" behavior anyway.

²Keep in mind, that reads and writes can move through the tail of arrow, but the arrow is locked in place. That is why you cannot bubble up all of the reads outside of the loop.

³The lifting optimization must also ensure that the actual behavior of the thread is consistent with what the programmer originally intended. That requirement is easy to satisfy in this case because the compiler can easily see that complete is never written to on that thread.

**Brian , as always thank you very much**. But where do you see that `Thread.Join/Sleep` creates a barrier ? I looked into reflector and didn't find it. — Royi Namir, Mar 03 '13 at 07:41
I think what confuses me in many examples is that it's difficult to explain the behaviour observed in terms of the *specification* of volatile (which talks only about acquire and release semantics). It's easy to conclude that in this example, `volatile` simply prevents hoisting the read out of the loop, and from there it's a short step to the "`volatile` disables compiler optimisations to ensure a fresh read" myth. — anton.burger, Mar 03 '13 at 08:03
So Brian, in this example, would it be anywhere approaching correct to say: the loop represents a series of volatile reads, and in order to prevent subsequent reads (apparently) floating before prior ones (in other words, to avoid breaking the specification of `volatile`), 1) there actually have to *be* repeated reads, i.e. the compiler is *compelled* not to hoist, and 2) the jitter does whatever is necessary to ensure that the hardware actually performs a read every time? So a "fresh read" in this case is a _consequence_ of `volatile` + loop, starting from volatile's definition? — anton.burger, Mar 03 '13 at 08:17
@RoyiNamir: The memory barriers generated from `Thread.Join` and `Thread.Sleep` are injected from the unmanaged implementation of those methods. So you probably won't see evidence of them when decompiling. See my answer [here](http://stackoverflow.com/a/6932271/158779) for a list of memory barrier generators. — Brian Gideon, Mar 03 '13 at 20:12
@shambulator: Yes, I think you're absolutely right. The "fresh read" myth was really confusing for me for the longest time as well. It wasn't until I started visualizing loops in their unrolled forms and using the arrows to mark the fences that I finally could really wrap my head around all of this. So yes, even though the specification says nothing about a fresh read it is almost always the behavior that is created. Which is good because that is the behavior that is almost always desired! — Brian Gideon, Mar 03 '13 at 20:19
@BrianGideon I don't understand. you said : [_It is not the volatility of complete itself that is causing it to get flushed to main memory. It is the Thread.Join call and the memory barrier it generates that is causing that behavior._] but if I dont mark it as volatile , the code is not ending and keep running **although there is a join** - which should solve the endless running - as you said. but it doesnt. can you explain please ? — Royi Namir, May 24 '13 at 07:59
@RoyiNamir: If you omit `volatile` the write still gets committed to main memory because of the `Thread.Join` call. It's the read that will be get optimized out. That is why it doesn't end. I didn't mean to imply that `Thread.Join` fixes the endless running. — Brian Gideon, May 24 '13 at 14:00
@BrianGideon So you're saying that if we removed the `Thread.Join` call, the write to `complete` may not be committed to main memory? Because there's no fence preventing the write from being delayed, right? But then, how does c# guarantee that all writes to a volatile field are immediately committed? — dcastro, Feb 09 '14 at 08:38
@dcastro: Basically, yes. However, this is mostly theorectical. The JIT compiler could decide to go ahead commit the write to main memory even the specification says that it doesn't have to. Furthermore, in this specific case, the removal of `Thread.Join` will cause the thread to the end immediately. When a thread ends all writes are automatically committed anyway. So the answer to your question is technically yes, but in reality that write is probably going to happen immediately regardless. — Brian Gideon, Feb 09 '14 at 23:00
But the MSDN page for the volatile keyword does imply that writes will be flushed immediately: " This ensures that the most up-to-date value is present in the field at all times." I don't see how the release fence emitted for the volatile write could fulfil such guarantee... — dcastro, Feb 09 '14 at 23:06
@dcastro: In that case the documentation is just flat wrong. Other's have pointed out the deficiencies of that specific statement before. If you look at the actual ECMA specification for `volatile` it says nothing of sort. The specification, unlike the MSDN documentation, is written in the context of the ordering of reads and writes. Very confusing...I know. It took me a long to really grasp what's going on. Actually, I'm not entirely sure that I do have a full understanding even still. — Brian Gideon, Feb 09 '14 at 23:18

Volatile fence demo?

2 Answers2

Linked