2

(I know they don't but I'm looking for the underlying reason this actually works without using volatile since there should be nothing preventing the compiler from storing a variable in a register without volatile... or is there...)

This question stems from the discord of thought that without volatile the compiler (can optimize in theory any variable in various ways including storing it in a CPU register.) While the docs say that is not needed when using synchronization like lock around variables. But there is really in some cases seemingly no way the compiler/jit can know you will use them or not in your code path. So the suspicion is something else is really happening here to make the memory model "work".

In this example what prevents the compiler/jit from optimizing _count into a register and thus having the increment done on the register rather then directly to memory (later writing to memory after the exit call)? If _count were volatile it would seem everything should be fine, but a lot of code is written without volatile. It makes sense the compiler could know not to optimize _count into a register if it saw a lock or synchronization object in the method.. but in this case the lock call is in another function.

Most documentation says you don't need to use volatile if you use a synchronization call like lock.

So what prevents the compiler from optimizing _count into a register and potentially updating just the register within the lock? I have a feeling that most member variables won't be optimized into registers for this exact reason as then every member variable would really need to be volatile unless the compiler could tell it shouldn't optimize (otherwise I suspect tons of code would fail). I saw something similar when looking at C++ years ago local function variables got stored in registers, class member variables did not.

So the main question is, is it really the only way this possibly works without volatile that the compiler/jit won't put class member variables in registers and thus volatile is then unnecessary?

(Please ignore the lack of exception handling and safety in the calls, but you get the gist.)

public class MyClass
{
  object _o=new object();

  int _count=0;

  public void Increment()
  {
    Enter();
    // ... many usages of count here...
    count++;
    Exit();
  }




//lets pretend these functions are too big to inline and even call other methods 
// that actually make the monitor call (for example a base class that implemented these) 
  private void Enter() { Monitor.Enter(_o); }  
  private void Exit()  { Monitor.Exit(_o); }  //lets pretend this function is too big to inline
// ...
// ...
}
Brian Gideon
  • 47,849
  • 13
  • 107
  • 150
user2685937
  • 115
  • 1
  • 6
  • What if you instantiate 20 `MyClass` objects, and your hardware only has 8 general purpose registers? – Mephy Sep 13 '14 at 21:15
  • Be sure to read [Sayonara volatile](http://joeduffyblog.com/2010/12/04/sayonara-volatile/) – H H Sep 13 '14 at 21:42
  • There is nothing I know of in the .NET memory model that demands this behavior. It is simply conservative jitter optimizer design, writes to class fields are always externally observable and it assumes that internal calls might affect variable state. Any jitter that breaks these assumptions is not going to be very popular. – Hans Passant Sep 14 '14 at 00:30

3 Answers3

6

Entering and leaving a Monitor causes a full memory fence. Thus the CLR makes sure that all writing operations before the Monitor.Enter / Monitor.Exit become visible to all other threads and that all reading operations after the method call "happen" after it. That also means that statements before the call cannot be moved after the call and vice versa.

See http://www.albahari.com/threading/part4.aspx.

ominug
  • 1,422
  • 2
  • 12
  • 28
  • I don't think that would impact enregistration of variables into CPU registers. A Fence is for the CPU to not re-order loads or stores after the fence. It is a CPU effect that would only impact the CPU ordering. The Compiler/JIT would need to be smart enough to know the call path would require a fence at a future time and not store variables into registers which seems extremely difficult to do, and thus unlikely it would be able to pull that off. So I don't believe that is the answer here. – user2685937 Sep 13 '14 at 21:42
  • Even if we assume JIT was smart enough to pull that off because of the first time a program ran, it would rely on the first time the program ran calling the appropriate function to know that. If it did not do that the first time then the program wouldn't work the second time or later times the methods were called. Thus again a very unlikely implementation, so not likely what is happening. – user2685937 Sep 13 '14 at 21:46
  • @user2685937 By a memory fence, the caches of the CPU get flushed. I don't know how it is exactly implemented in processor architectures. You could imagine a write fence as if the processor waits as long as it lasts to propagate the value of the register to all other cores. – ominug Sep 13 '14 at 22:08
  • Right. But if the compiler/jit stored _count in a register fences/memory barriers don't matter since a barrier doesn't force a register to put things to memory that requires an explicit assembly instruction. I'm using Intel x86. But I don't think that is overly relevant since I think this question is compiler/jit related. – user2685937 Sep 13 '14 at 22:10
  • Eventually we have to trust the documented memory model of C#. It should be correctly implemented and all necessary steps should take effect. If an explicit assembly instruction is required why not also that? – ominug Sep 13 '14 at 22:19
  • I think it's actually important to understand the underpinnings of why things work the way they do. Why the model works for example. I believe the model may only work because it won't store class member variables in registers and if that is correct it might add to some discussion on volatile not being needed in some other cases as well. – user2685937 Sep 13 '14 at 22:25
  • Please remove the answer about Monitor and memory fences as that can't be correct as described in the comments. I know my question wasn't clear initially as to I want the underlying reason why this actually works, not just the statement by ECMA/MS that it does. I updated my question to be more explicit. – user2685937 Sep 13 '14 at 23:33
  • 1
    @user2685937 What you're basically missing is that, yes if the compiler can't inline your enter/exit methods it has to conservatively assume that it has to flush all reachable variables at this point - meaning that it will have to generate writes for these variables to memory. The fences of your enter/exit methods just make sure that the CPU really does finish the writes first before continuing. – Voo Sep 14 '14 at 06:47
  • @Voo that would suggest that if the compiler calls ANY methods at all then it would store the value even if it was in a register. While I think that probably does make sense with methods called within the same class since they might modify fields, but what if the fields are private and the Enter/Exit calls were in a different object/class. It would seem to be an over optimization if the compiler assumed to store all variables in a register. It also seems unlikely since we know there are situations if you do "while (_nonvolatilevariable) { } – user2685937 Feb 21 '15 at 19:58
  • @Voo (continued from previous comment) and call methods that the compiler could have optimized that _nonvolatilevariable into a register if it knows from a single thread it will never change, even if the while loop calls methods. That is a pretty well known compiler issue/implementation. Thus I think it holds that most compilers do not optimize class fields in this way otherwise every field would need to be volatile when multithreading even when using lock/monitor calls. – user2685937 Feb 21 '15 at 20:00
  • I guess I could see, as you mention, that a very dumb compiler optimization prevention technique by saving all variables whenever there is any method calls would prevent consistency issues but allow some optimization within methods. It would be interesting to hear from a compiler designer if that is in fact what they do. – user2685937 Feb 21 '15 at 21:08
  • @user2685937 "that would suggest that if the compiler calls ANY methods at all then it would store the value even if it was in a register" - for any field that may be leaked outside the scope that's exactly what it does if it can't inline the function call. And even private fields or local variables may need to be spilled if they are not in callee-saved registers. – Voo Mar 02 '15 at 19:39
  • And a private field is not as save as you think as there's reflection to worry - yes you can be clever about that, but that's pretty complicated. And anyhow, the idea is that any function call you do will do a non-trivial amount of work which amortizes the marginal cost of spilling registers. But yes, that's why inlining is probably the most important of all optimizations. – Voo Mar 02 '15 at 19:42
  • @Voo from your comment "... yes if the compiler can't inline your enter/exit methods it has to ... assume that it has to flush all reachable variables ..." . Even if it inlines it would call Monitor.Enter which is another function call I think it probably still just dumps the variables from registers there regardless to avoid issues. Any call out of an object could allow calls back into the objects properties and methods so if the values in our object were not consistent at that point even from a single thread things would appear wrong, so it must just save from registers on any call. – user2685937 Mar 03 '15 at 13:08
  • @user2685937 Yes inlining only helps to avoid the spilling for the inlined method nothing more. That said Monitor.Enter is almost certainly a compiler intrinsic so the compiler would know about it. About your other point, yes as I said this only works if no reference escapes, there are scenarios where that's possible to guarantee although the cost of spilling a register shouldn't be overestimated. – Voo Mar 03 '15 at 18:07
0

The best guess answer to this question would appear to be that that any variables that are stored in CPU registers are saved to memory before any function would be called. This makes sense because compiler design viewpoint from a single thread would require that, otherwise the object might appear to be inconsistent if it were used by other functions/methods/objects. So it may not be so much as some people/articles claim that synchronization objects/classes are detected by the compilers and non-volatile variables are made safe through their calls. (Perhaps they are when a lock is used or other synchronization objects in the same method, but once you have calls in another method that calls those synchronization objects probably not), instead it is likely that just the fact of calling another method is probably enough to cause the values stored in CPU registers to be saved to memory. Thus not requiring all variables to be volatile.

Also I suspect and others have suspected too that fields of a class are not optimized as much due to some of the threading concerns.

Some notes (my understanding): Thread.MemoryBarrier() is mostly a CPU instruction to insure writes/reads don't bypass the barrier from a CPU perspective. (This is not directly related to values stored in registers) So this is probably not what directly causes to save variables from registers to memory (except just by the fact it is a method call as per our discussion here, would likely cause that to happen- It could have really been any method call though perhaps to affect all class fields that were used being saved from registers)

It is theoretically possible the JIT/Compiler could also take that method into an account in the same method to ensure variables are stored from CPU registers. But just following our simple proposed rule of any calls to another method or class would result in saving variables stored in registers to memory. Plus if someone wrapped that call in another method (maybe many methods deep), the compiler wouldn't likely analyze that deep to speculate on execution. The JIT could do something but again it likely wouldn't analyze that deep, and both cases need to ensure locks/synchronization work no matter what, thus the simplest optimization is the likely answer.

Unless we have anyone that writes the compilers that can confirm this its all a guess, but its likely the best guess we have of why volatile is not needed.

If that rule is followed synchronization objects just need to employ their own call to MemoryBarrier when they enter and leave to ensure the CPU has the most up to date values from its write caches so they get flushed so proper values can be read. On this site you will see that is what is suggested implicit memory barriers: http://www.albahari.com/threading/part4.aspx

user2685937
  • 115
  • 1
  • 6
0

So what prevents the compiler from optimizing _count into a register and potentially updating just the register within the lock?

There is nothing in the documentation that I am aware that would preclude that from happening. The point is that the call to Monitor.Exit will effectively guarantee that the final value of _count will be committed to memory upon completion.

It makes sense the compiler could know not to optimize _count into a register if it saw a lock or synchronization object in the method.. but in this case the lock call is in another function.

The fact that the lock is acquired and released in other methods is irrelevant from your point of view. The model memory defines a pretty rigid set of rules that must be adhered to regarding memory barrier generators. The only consequence of putting those Monitor calls in another method is that JIT compiler will have a harder time complying with those rules. But, the JIT compiler must comply; period. If the method calls get to complex or nested too deep then I suspect the JIT compiler punts on any heuristics it might have in this regard and says, "Forget it, I'm just not going to optimize anything!"

So the main question is, is it really the only way this possibly works without volatile that the compiler/jit won't put class member variables in registers and thus volatile is then unnecessary?

It works because the protocol is to acquire the lock prior to reading _count as well. If the readers do not do that then all bets are off.

Brian Gideon
  • 47,849
  • 13
  • 107
  • 150