6

Loop hoisting a volatile read

I have read many places that a volatile variable can not be hoisted from a loop or if, but I cannot find this mentioned any places in the C# spec. Is this a hidden feature?

All writes are volatile in C#

Does this mean that all writes have the same properties without, as with the volatile keyword? Eg ordinary writes in C# has release semantics? and all writes flushes the store buffer of the processor?

Release semantics

Is this a formal way of saying that the store buffer of a processor is emptied when a volatile write is done?

Acquire semantics

Is this a formal way of saying that is should not load a variable into a register, but fetch it from memory every time?

In this article, Igoro speaks of "thread cache". I perfectly understand that this is imaginary, but is he in fact referring to:

  1. Processor store buffer
  2. loading variables into registers instead of fetching from memory every time
  3. Some sort of processor cache (is this L1 and L2 etc)

Or is this just my imagination?

Delayed writing

I have read many places that writes can be delayed. Is this because of the reordering, and the store buffer?

Memory.Barrier

I understand that a side-effect is a call to "lock or" when JIT is transforming IL to asm, and this is why a Memory.Barrier can solve the delayed write to main memory (in the while loop) in fx this example:

static void Main()
{
  bool complete = false; 
  var t = new Thread (() =>
  {
    bool toggle = false;
    while (!complete) toggle = !toggle;
  });
  t.Start();
  Thread.Sleep (1000);
  complete = true;
  t.Join();        // Blocks indefinitely
}

But is this always the case? Will a call to Memory.Barrier always flush the store buffer fetch updated values into the processor cache? I understand that the complete variable is not hoisted into a register and is fetched from a processor cache, every time, but the processor cache is updated because of the call to Memory.Barrier.

Am I on thin ice here, or have I some sort of understand of volatile and Memory.Barrier?

dcastro
  • 66,540
  • 21
  • 145
  • 155
mslot
  • 4,959
  • 9
  • 44
  • 76
  • 7
    By my count you got 11 question marks there :) Which question mark is the question mark in question? – Alex Feb 24 '14 at 13:57
  • @Damien_The_Unbeliever, but I have read, many places, that hoisting is not done when a field is volatile. And I cannot find this i the spec, so it could sound like a hidden feature. – mslot Feb 24 '14 at 14:04
  • @Alex, so you would rather that I post 11 single questions at once? I will remember that next time. – mslot Feb 24 '14 at 14:05
  • I’m unable to reproduce the blocking behavior of both the code in the question, and the one in the linked article… uhm… – poke Feb 24 '14 at 14:07
  • @poke are you running it in release mode? – dcastro Feb 24 '14 at 14:10
  • 2
    @poke are you running it in debug mode? And have compiled it release mode? – mslot Feb 24 '14 at 14:10
  • I’m in release mode, yes. Nevermind though; it seems that Visual Studio prevents it from hanging even if running in release mode. When starting the executable on its own, it (correctly) blocks. – poke Feb 24 '14 at 14:13
  • @poke actually, I can't reproduce it either.. can you link me to the chat please? – dcastro Feb 24 '14 at 14:17
  • @dcastro [There you go](http://chat.stackoverflow.com/rooms/7/c). – poke Feb 24 '14 at 14:20
  • @EliArbel thanks, but I have read those. Including http://msdn.microsoft.com/da-dk/magazine/cc163715(en-us).aspx. That is why I have so many questions on my mind. – mslot Feb 24 '14 at 14:23

1 Answers1

13

That's a mouthful..

I'm gonna start with a few of your questions, and update my answer.


Loop hoisting a volatile

I have read many places that a volatile variable can not be hoisted from a loop or if, but I cannot find this mentioned any places in the C# spec. Is this a hidden feature?

MSDN says "Fields that are declared volatile are not subject to compiler optimizations that assume access by a single thread". This is kind of a broad statement, but it includes hoisting or "lifting" variables out of a loop.


All writes are volatile in C#

Does this mean that all writes have the same properties without, as with the volatile keyword? Eg ordinary writes in C# has release semantics? and all writes flushes the store buffer of the processor?

Regular writes are not volatile. They do have release semantics, but they don't flush the CPU's write-buffer. At least, not according to the spec.

From Joe Duffy's CLR 2.0 Memory Model

Rule 2: All stores have release semantics, i.e. no load or store may move after one.

I've read a few articles stating that all writes are volatile in C# (like the one you linked to), but this is a common misconception. From the horse's mouth (The C# Memory Model in Theory and Practice, Part 2):

Consequently, the author might say something like, “In the .NET 2.0 memory model, all writes are volatile—even those to non-volatile fields.” (...) This behavior isn’t guaranteed by the ECMA C# spec, and, consequently, might not hold in future versions of the .NET Framework and on future architectures (and, in fact, does not hold in the .NET Framework 4.5 on ARM).


Release semantics

Is this a formal way of saying that the store buffer of a processor is emptied when a volatile write is done?

No, those are two different things. If an instruction has "release semantics", then no store/load instruction will ever be moved below said instruction. The definition says nothing regarding flushing the write-buffer. It only concerns instruction re-ordering.


Delayed writing

I have read many places that writes can be delayed. Is this because of the reordering, and the store buffer?

Yes. Write instructions can be delayed/reordered by either the compiler, the jitter or the CPU itself.


So a volatile write has two properties: release semantics, and store buffer flushing.

Sort of. I prefer to think of it this way:

The C# Specification of the volatile keyword guarantees one property: that reads have acquire-semantics and writes have release-semantics. This is done by emitting the necessary release/acquire fences.

The actual Microsoft's C# implementation adds another property: reads will be fresh, and writes will be flushed to memory immediately and be made visible to other processors. To accomplish this, the compiler emits an OpCodes.Volatile, and the jitter picks this up and tells the processor not to store this variable on its registers.

This means that a different C# implementation that doesn't guarantee immediacy will be a perfectly valid implementation.


Memory Barrier

bool complete = false; 
var t = new Thread (() =>
{
    bool toggle = false;
    while (!complete) toggle = !toggle;
});
t.Start();
Thread.Sleep(1000);
complete = true;
t.Join();     // blocks

But is this always the case? Will a call to Memory.Barrier always flush the store buffer fetch updated values into the processor cache?

Here's a tip: try to abstract yourself away from concepts like flushing the store buffer, or reading straight from memory. The concept of a memory barrier (or a full-fence) is in no way related to the two former concepts.

A memory barrier has one sole purpose: ensure that store/load instructions below the fence are not moved above the fence, and vice-versa. If C#'s Thread.MemoryBarrier just so happens to flush pending writes, you should think about it as a side-effect, not the main intent.

Now, let's get to the point. The code you posted (which blocks when compiled in Release mode and ran without a debugger) could be solved by introducing a full fence anywhere inside the while block. Why? Let's first unroll the loop. Here's how the first few iterations would look like:

if(complete) return;
toggle = !toggle;

if(complete) return;
toggle = !toggle;

if(complete) return;
toggle = !toggle;
...

Because complete is not marked as volatile and there are no fences, the compiler and the cpu are allowed to move the read of the complete field. In fact, the CLR's Memory Model (see rule 6) allows loads to be deleted (!) when coalescing adjacent loads. So, this could happen:

if(complete) return;
toggle = !toggle;
toggle = !toggle;
toggle = !toggle;
...

Notice that this is logically equivalent to hoisting the read out of the loop, and that's exactly what the compiler may do.

By introducing a full-fence either before or after toggle = !toggle, you'd prevent the compiler from moving the reads up and merging them together.

if(complete) return;
toggle = !toggle;
#FENCE
if(complete) return;
toggle = !toggle;
#FENCE
if(complete) return;
toggle = !toggle;
#FENCE
...

In conclusion, the key to solving these issues is ensuring that the instructions will be executed in the correct order. It has nothing to do with how long it takes for other processors to see one processor's writes.

dcastro
  • 66,540
  • 21
  • 145
  • 155
  • The memory model of CLR 2.0 and the memory model described in the C# specification are not the same. I think this question asks about the latter, not about the former. – svick Feb 24 '14 at 14:08
  • @svick Well, if the memory model implementation doesn't require that stores be flush to memory immediately, then I guess the specification of the model doesn't either. Is that an unfair assumption? – dcastro Feb 24 '14 at 14:12
  • So a volatile write has two properties: release semantics, and store buffer flushing. I have always seen release semantics and flushing as one. – mslot Feb 24 '14 at 14:18
  • @mslot I've appended a couple of items to my answer regarding that comment. – dcastro Feb 24 '14 at 14:25
  • @dcastro Your comment on hoisting (link to the MSDN) and the difference between the spec and implementation with release semantics and refresh of cache etc. made my day! Exactly what I wanted to hear, find. Thanks allot! – mslot Feb 24 '14 at 15:45
  • 1
    @mslot I've added something to my answer (at the end) regarding the code you posted, and why a memory barrier would fix it. I also added a note concerning your observation that "All writes are volatile in C#". I hope to have cleared a few of your doubts :) – dcastro Feb 26 '14 at 00:35
  • So in the unrole, with a Memory.Barrier, you make sure that the load is always done on a variable that is not pulled from a register? – mslot Feb 27 '14 at 09:00
  • @mslot my point is: *if* that happens, that's an implementation detail. The point of the memory barrier is to prevent instructions from being moved across the fence. In this case, the fence prevents the reads from being moved back in time (which is the logical equivalent of reading a cached value - which, again, is a detail). – dcastro Feb 27 '14 at 10:11
  • Sorry. Of course that is an implementation detail. The load is always performed with the Memory.Barrier. I think I am getting this. Thanks allot. – mslot Feb 27 '14 at 11:57
  • Just reading a few questions regarding memory barriers this morning and am concerned that some people seem to think that a full fence around a write guarantees that the written value is immediately available to threads running on other cores. It does not. – 0b101010 Jun 10 '16 at 10:03
  • Regarding the latest example (if the MemoryBarrier is used), after the first thread sets `complete = true`, when the other thread reads `!complete`, it is not guaranteed that it will see the updated value? It may see it in future iterations? – Petrakeas Jun 22 '16 at 12:09
  • @dcastro I'm trying to understand if a read (that has not be re-ordered due to Memory Barrier) is guaranteed to see the result of a write on another write. I have created a simple test case that relies on this on my question: http://stackoverflow.com/questions/38050681/do-memory-barriers-guarantee-a-fresh-read-in-c/ Can you please comment on that? – Petrakeas Jul 03 '16 at 13:40