Program hangs in release mode but works fine in debug mode

Question

The code below works as expected in debug mode, completing after 500 milliseconds, but hangs indefinitely in release mode:

public static void Main(string[] args)
{            
   bool isComplete = false;

   var t = new Thread(() =>
   {
       int i = 0;
               
        while (!isComplete) i += 0;
   });
 
   t.Start();
    
   Thread.Sleep(500);
   isComplete = true;
   t.Join();
   Console.WriteLine("complete!");
}

I came across this and want to know the reason for this behavior in debug & release mode.

This is wierd. If `i` is declared outside the lambda, code works fine in debug and release mode, but if its declared inside the lambda body, code hangs in release mode. Some optimization gone wrong? — InBetween, Jan 13 '17 at 10:38
If it was java I would assume that the compiler does not see updates to the 'compile' variable. Adding 'volatile' to the variable declaration would fix that (and making it a static field). — Sebastian, Jan 13 '17 at 10:57
Note that when decompiling both the release version and the debug version, there is no noticeable difference between them. The release version mostly drops a few nops. The IL also shows that the control flow is a bit different (causing the same decompiled source though), not sure if that can have this effect. — poke, Jan 13 '17 at 11:24
Ah, the [Heisenbug](https://en.wikipedia.org/wiki/Heisenbug). — David Richerby, Jan 13 '17 at 14:53
It would help if you repeated, or even better expanded on, the problem in the body of the question. — ChrisF, Jan 13 '17 at 15:34
Note: this is why multithreaded devlopment has things like mutexes and atomic operations. When you start to go multithreaded, you need to consider a remarkable set of additional memory issues which were not obvious before. Threading synchronization tools like mutexes would have resolved this. — Cort Ammon, Jan 13 '17 at 15:57
I'm not that familiar with C#. Does it permit you to access a variable in one thread while another thread is, or might be, modifying it? If not, then this code is just broken. — David Schwartz, Jan 13 '17 at 17:41
@DavidSchwartz: Certainly that is *permitted*. And it is *permitted* for the compiler, runtime and CPU to make the result of that code be different than what you expect. In particular, C# is not permitted to make bool access *nonatomic*, but it is permitted to move non-volatile reads backwards in time. Doubles, by contrast, have no such restriction on atomicity; a double read and written on two different threads without synchronization is permitted to be torn. — Eric Lippert, Jan 13 '17 at 17:54

score 151 · Answer 1 · edited Jan 14 '17 at 17:48

151

I guess that the optimizer is fooled by the lack of 'volatile' keyword on the isComplete variable.

Of course, you cannot add it, because it's a local variable. And of course, since it is a local variable, it should not be needed at all, because locals are kept on stack and they are naturally always "fresh".

However, after compiling, it is no longer a local variable. Since it is accessed in an anonymous delegate, the code is split, and it is translated into a helper class and member field, something like:

public static void Main(string[] args)
{
    TheHelper hlp = new TheHelper();

    var t = new Thread(hlp.Body);

    t.Start();

    Thread.Sleep(500);
    hlp.isComplete = true;
    t.Join();
    Console.WriteLine("complete!");
}

private class TheHelper
{
    public bool isComplete = false;

    public void Body()
    {
        int i = 0;

        while (!isComplete) i += 0;
    }
}

I can imagine now that the JIT compiler/optimizer in a multithreaded environment, when processing TheHelper class, can actually cache the value false in some register or stack frame at the start of the Body() method, and never refresh it until the method ends. That's because there is NO GUARANTEE that the thread&method will NOT end before the "=true" gets executed, so if there is no guarantee, then why not cache it and get the performance boost of reading the heap object once instead of reading it at every iteration.

This is exactly why the keyword volatile exists.

For this helper-class to be ~~correct~~ a tiny little bit better ¹⁾ in multi-threaded environments, it should have:

    public volatile bool isComplete = false;

but, of course, since it's autogenerated code, you can't add it. A better approach would be to add some lock()s around reads and writes to isCompleted, or to use some other ready-to-use synchronization or threading/tasking utilities instead of trying to do it bare-metal (which it will not be bare-metal, since it's C# on CLR with GC, JIT and (..)).

The difference in debug mode occurs probably because in debug mode many optimisations are excluded, so you can, well, debug the code you see on the screen. Therefore while (!isComplete) is not optimized so you can set a breakpoint there, and therefore isComplete is not aggressively cached in a register or stack at the method start and is read from the object on the heap at every loop iteration.

BTW. That's just my guesses on that. I didn't even try to compile it.

BTW. It doesn't seem to be a bug; it's more like a very obscure side effect. Also, if I'm right about it, then it may be a language deficiency - C# should allow to place 'volatile' keyword on local variables that are captured and promoted to member fields in the closures.

¹⁾ see below for a comments from Eric Lippert about volatile and/or this very interesting article showing the levels of complexity involved in ensuring that code relying on volatile is ~~safe~~ ..uh, ~~good~~ ..uh, let's say OK.

edited Jan 14 '17 at 17:48

Peter Mortensen

30,738
21
105
131

answered Jan 13 '17 at 11:56

quetzalcoatl

32,194
8
68
107

2

@EricLippert: whoa, thank you very much for confirming that so fast! How do you think, is there any chance that in some future version we can get a `volatile` option on capture-to-closure local variables? I imagine it can be a bit difficult to process by the compiler.. – quetzalcoatl Jan 13 '17 at 12:07
Good catch! Now the question is, why removing the read to `i` (`i = 0;` instead of `i + = 0`) makes the problem go away? @EricLippert – InBetween Jan 13 '17 at 12:09
7

@quetzalcoatl: I wouldn't count on that feature being added any time soon. This is the kind of coding you'd want to *discourage*, and not *make easier*. And besides, making stuff volatile does not necessarily solve every problem. Here is an example where everything is volatile and the program is still wrong; can you find the bug? http://blog.coverity.com/2014/03/26/reordering-optimizations/ – Eric Lippert Jan 13 '17 at 12:18
@InBetween: I think that EricLipper already summed that up in a comment for Pikoh's answer: it's permitted - so why care? Optimizations are always tricky.. see C++ and its specs and what optimizer can make out of them.. it's both beautiful to observe, and sometimes, almost insane to understand why it's permitted :D – quetzalcoatl Jan 13 '17 at 12:18
1

Here is @EricLippert comment as i deleted my answer _Don't look for explanations of why things are different when things are different. Rather, the relevant question is what behaviour is permitted by the language specification? And the answer is that the jitter is permitted to optimize the loop as much as it likes for any reason that it likes. The jitter is permitted to make the optimization, and is also permitted to not make the optimization, and hey, you've observed both permitted behaviours. Why does it change its mind? Who cares? The relevant fact is that it is entirely permitted to do so_ – Pikoh Jan 13 '17 at 12:21
3

Understood. I give up trying to understand multithreading optimizations...it's insane how complicated it is. – InBetween Jan 13 '17 at 12:27
@EricLippert: very true, I totally agree. What's usually needed is some well-planned thread synchronization, not just volatiles.. About the article- very interesting issue. I actually didn't guess 100% correctly, but I was quite close in thinking about no-lock in Get() and some 2-core cpu and cache/bus synchronization between cores.. not just something **that simple** as that read/write (micro?)ops can be swapped when target addresses differ.. that's almost.. elementary (when considering how CPU executes it.. not from the viewpoint of a programmer writing high-level code) – quetzalcoatl Jan 13 '17 at 12:30
2

Now, if you do want to know why the optimizer makes some choices and not others, then *try to write an optimizer*, even just in your head. If you had to write an optimizer, how would you optimize `i += 0;`? Probably eliminate it entirely. So now the loop is empty, which means that the loop is just a test and a goto. There's nothing in the loop which can change the test *because there's nothing in the loop at all* and therefore the question is "can the test be observed to cause or depend on a side effect?". Now imagine there is *any* other code in the loop; how does that change your analysis? – Eric Lippert Jan 13 '17 at 12:55
@EricLippert in this case,with `i+=0`, i understand your point completely. But, in my tests, changing it to,eg,`i+=1;` has the same effect. And in that case, in shouldn't be optimized by removing it. What may happen in that case? – Pikoh Jan 13 '17 at 13:08
10

@Pikoh: Again, think like an optimizer. You have a variable that is incremented but never read. A variable that is never read can be deleted entirely. – Eric Lippert Jan 13 '17 at 13:47
4

@EricLippert now my mind made a click. This thread has been very informative, thank you very much,really. – Pikoh Jan 13 '17 at 14:04
1

@quetzalcoatl: In C and C++, people used to say "`volatile` can (safely) be used for this purpose" too, despite the fact that's quite *wrong*. If, in C#, `volatile` really can (safely) be used for this purpose, I think this answer needs something added to demonstrate that it knows what it's talking about rather than regurgitating bad information. – Jan 14 '17 at 17:17
You might be able to make `isComplete` an array of boolean to get around the volatile thing. Array should only be referenced by the thread and not copied. – Brandon Jan 14 '17 at 17:57
@EricLippert: Re: your blog post example, isn't it fair to put the blame on C# for poor design? I mean, consider that this wouldn't happen in C++11 with `atomic`, because `atomic`'s default behavior is SC (sequentially consistent) -- which is exactly what (even C++!) programmers expect. C# not only completely violates the principle of least astonishment here (maybe excusable lack of C++'s hindsight, though SC was an idea from... 1979?), but AFAIK, *even **since** C++11*, C# has not provided SC atomics. i.e., It's made lock-free programming *harder* than it already is!! Why?! – user541686 Jan 15 '17 at 23:59
@EricLippert: (that is, unless I'm mistaken and sequential consistency wouldn't fix that for some reason, which is totally possible...) – user541686 Jan 16 '17 at 00:08
2

@Mehrdad: There is a lot of blame/credit to go around, from chip designers to operating system designers, to language designers, to users. All have expectations of performance, ease of use and reliability; these things are unfortunately often in conflict. Do I personally wish that memory models were stronger, even at the expense of performance? Yes. Do I wish that the go-to tool in the multithreaded toolbox had nicer properties than locks? Absolutely. But we arrived at this situation by compromising amongst many conflicting goals. – Eric Lippert Jan 16 '17 at 14:22
@EricLippert: I would agree, but if I'm being honest in this particular situation it's a little difficult for me to see why C# or .NET couldn't provide a few extra methods for sequentially consistent loads/stores. It's still not too late though ;) maybe you can urge them to do that for the next version! – user541686 Jan 16 '17 at 20:45

score 83 · Answer 2 · edited May 23 '17 at 10:31

The answer of quetzalcoatl is correct. To shed more light on it:

The C# compiler and CLR jitter are permitted to make a great many optimizations that assume that the current thread is the only thread running. If those optimizations make the program incorrect in a world where the current thread is not the only thread running that is your problem. You are required to write multithreaded programs that tell the compiler and jitter what crazy multithreaded stuff you are doing.

In this particular case the jitter is permitted -- but not required -- to observe that the variable is unchanged by the loop body and to therefore conclude that -- since by assumption this is the only thread running -- the variable will never change. If it never changes then the variable needs to be checked for truth once, not every time through the loop. And this is in fact what is happening.

How to solve this? Don't write multithreaded programs. Multithreading is incredibly hard to get right, even for experts. If you must, then use the highest level mechanisms to achieve your goal. The solution here is not to make the variable volatile. The solution here is to write a cancellable task and use the Task Parallel Library cancellation mechanism. Let the TPL worry about getting the threading logic right and the cancellation properly send across threads.

Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/133276/discussion-on-answer-by-eric-lippert-this-code-hangs-in-release-mode-but-works-f). — Madara's Ghost, Jan 16 '17 at 14:11

score 14 · Answer 3 · edited Jan 14 '17 at 22:37

I attached to the running process and found (if I didn't make mistakes, I'm not very practiced with this) that the Thread method is translated to this:

debug051:02DE04EB loc_2DE04EB:                            
debug051:02DE04EB test    eax, eax
debug051:02DE04ED jz      short loc_2DE04EB
debug051:02DE04EF pop     ebp
debug051:02DE04F0 retn

eax (which contains the value of isComplete) is loaded first time and never refreshed.

score 8 · Answer 4 · edited Jan 14 '17 at 17:45

Not really an answer, but to shed some more light on the issue:

The problem seems to be when i is declared inside the lambda body and it's only read in the assignment expression. Otherwise, the code works well in release mode:

i declared outside the lambda body:

int i = 0; // Declared outside the lambda body

var t = new Thread(() =>
{
    while (!isComplete) { i += 0; }
}); // Completes in release mode

i is not read in the assignment expression:

var t = new Thread(() =>
{
    int i = 0;
    while (!isComplete) { i = 0; }
}); // Completes in release mode

i is also read somewhere else:

var t = new Thread(() =>
{
    int i = 0;
    while (!isComplete) { Console.WriteLine(i); i += 0; }
}); // Completes in release mode

My bet is some compiler or JIT optimization regarding i is messing up things. Somebody smarter than me will probably be able to shed more light on the issue.

Nonetheless, I wouldn't worry too much about it, because I fail to see where similar code would actually serve any purpose.

see my answer, I'm pretty sure it's all about 'volatile' keyword thatcannot be added to a local variable (which actually gets later promoted to a member field in the closure).. — quetzalcoatl, Jan 13 '17 at 12:04

Program hangs in release mode but works fine in debug mode

4 Answers4

Linked