.NET JIT compiler volatile optimizations

Question

https://msdn.microsoft.com/en-us/magazine/jj883956.aspx

Consider the polling loop pattern:
private bool _flag = true; 
public void Run() 
{
    // Set _flag to false on another thread
    new Thread(() => { _flag = false; }).Start();
    // Poll the _flag field until it is set to false
    while (_flag) ;
    // The loop might never terminate! 
} 
In this case, the .NET 4.5 JIT compiler might rewrite the loop like this:
if (_flag) { while (true); } 
In the single-threaded case, this transformation is entirely legal and, in general, hoisting a read out of a loop is an excellent optimization. However, if the _flag is set to false on another thread, the optimization can cause a hang.

Note that if the _flag field were volatile, the JIT compiler would not hoist the read out of the loop. (See the “Polling Loop” section in the December article for a more detailed explanation of this pattern.)

Will the JIT compiler still optimize the code as shown above if I lock _flag or will only making it volatile stop the optimization?

Eric Lippert has the following to say about volatile:

Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place. Locks guarantee that memory read or modified inside the lock is observed to be consistent, locks guarantee that only one thread accesses a given chunk of memory at a time, and so on. The number of situations in which a lock is too slow is very small, and the probability that you are going to get the code wrong because you don't understand the exact memory model is very large. I don't attempt to write any low-lock code except for the most trivial usages of Interlocked operations. I leave the usage of "volatile" to real experts.

To summarize: Who ensures that the optimization mentioned above doesn't destroy my code? Only volatile? Also the lock statement? Or something else?

As Eric Lippert discourages you from using volatile there must be something else?

Downvoters: I appreciate every feedback to the question. Especially if you downvoted it I'd like to hear why you think this is a bad question.

A bool variable is not a thread synchronization primitive: The question is meant as a generell question. When will the compiler not do the optimizion?

Dupilcate: This question is explicitly about optimizations. The one you linked doesn't mention optimizations.

Changing an individual variable between threads can be done via [Interlocked Operations](https://learn.microsoft.com/en-us/dotnet/api/system.threading.interlocked?view=netframework-4.7.2) however, cancellation tokens are probably what you're looking for as described in the answers below. — Mgetz, Oct 01 '18 at 12:51
@Mgetz Hey thanks for your comment. I'm neither looking for interlock operations nor cancellation tokens specificly. I just want to know how I can stop the optimization above in this specific case. Would wraping the field with a lock do the trick? Something else? How can I prevent the code from breaking after the optimization/prevent the optimization? — NtFreX, Oct 01 '18 at 12:55
Interlocked ops are a special case that tells the compiler to treat them very specially. You're essentially telling the compiler that you need it to treat that memory location as if it might change at any time. An in return you'll only use special means of accessing it (interlocks). It can then optimize around that appropriately — Mgetz, Oct 01 '18 at 13:09
A *bool* variable is not a thread synchronization primitive, it never will be. You'll never have to worry about the exact way in which Microsoft fumbled the volatile keyword when you do this correctly. ManualResetEventSlim is a nice wrapper around Interlocked, you can make a bool work by using Volatile.Read() in the if-statement and Volatile.Write() to set it. Task and CancellationToken raise the abstraction level with few disadvantages as long as you don't ignore exceptions. — Hans Passant, Oct 01 '18 at 13:10
Possible duplicate of [Volatile vs. Interlocked vs. lock](https://stackoverflow.com/questions/154551/volatile-vs-interlocked-vs-lock) — Mgetz, Oct 01 '18 at 13:25
@Mgetz This question is explicitly about optimizations. The one you linked doesn't mention optimizations. — NtFreX, Oct 01 '18 at 13:27
@NtFreX you've asked two different questions in the same go. One of which is a dup one of which is new. The short answer to the new questions is: That optimization happens at JIT not at primary compile. — Mgetz, Oct 01 '18 at 13:28
@Mgetz "That optimization happens at JIT not at primary compile **when the field is not interlocked**". I only ment to ask about the optimization. If you can help me make it clearer I would appreshiate it. — NtFreX, Oct 01 '18 at 13:31
@NtFreX when you use a lock or interlocks you're telling the compiler that the code or memory location has data races. Thus the compiler knows that it can't make any assumptions about any memory locations that might be observable to other threads in that block or interlocked memory location. Whereas if that doesn't exist it can assume all it wants. — Mgetz, Oct 01 '18 at 13:34
@Mgetz Yep thanks. I got that by now. Thats why I appended "when the field is not interlocked" to your sentence. If somebody would post that as an answer I would accept it. And more details is allways nice. — NtFreX, Oct 01 '18 at 13:35
`This question is explicitly about optimizations` that's not the right viewpoint. What matters is how the language is specified to behave. The JIT will only optimize under the constraint of not violating the specification. Optimization are invisible to the program therefore. The issue with the code in this question is not that it's being optimized. The issue is that nothing in the specification forces the program to be correct. In order to fix this you do *not* turn off optimizations or somehow communicate with the compiler. You use primitives that guarantee the behavior that you need. — usr, Oct 01 '18 at 13:54
lock and Interlocked etc. do not turn off optimizations. They request a certain behavior. — usr, Oct 01 '18 at 13:55
@usr So basicly ignore the optimizations, they won't affect you if you are doing things correctly. But I find it still good to know why the error behavoir is random when I do things wrong. It tells me where to look. — NtFreX, Oct 01 '18 at 13:57
@usr `lock and Interlocked etc. do not turn off optimizations. They request a certain behavior.` that is clear. Sorry about my english. — NtFreX, Oct 01 '18 at 13:57

Eric Lippert · Accepted Answer · 2018-10-01T14:04:22.153

12

Let's answer the question that was asked:

Will the JIT compiler still optimize the code as shown above if I lock _flag or will only making it volatile stop the optimization?

OK, let's not answer the question that was asked, because that question is too complicated. Let's break it down into a series of less complicated questions.

Will the JIT compiler still optimize the code as shown above if I lock _flag?

Short answer: lock gives a stronger guarantee than volatile, so no, the jitter will not be permitted to lift the read out of the loop if there is a lock around the read of _flag. Of course the lock also has to be around the write. Locks only work if you use them everywhere.

private bool _flag = true; 
private object _flagLock = new object();
public void Run() 
{
  new Thread(() => { lock(_flaglock) _flag = false; }).Start();
  while (true)
    lock (_flaglock)
      if (!_flag)
        break;
}

(And of course, I note that this is an insanely bad way to wait for one thread to signal another. Don't ever sit in a tight loop polling a flag! Use a wait handle like a sensible person.)

You said locks were stronger than volatiles; what does that mean?

Reads to volatiles prevent certain operations from being moved around in time. Writes to volatiles prevent certain operations from being moved around in time. Locks prevent more operations from being moved around in time. These prevention semantics are called "memory fences" -- basically, volatiles introduce a half fence, locks introduce a full fence.

Read the C# specification section on special side effects for the details.

As always, I'll remind you that volatiles do not give you global freshness guarantees. There is no such thing in multithreaded C# programming as "the latest" value of a variable, and so volatile reads do not give you "the latest" value, because it doesn't exist. The idea that there is a "latest" value implies that reads and writes are always observed to have a globally consistent ordering in time, and that is false. Threads can still disagree on the order of volatile reads and writes.

Locks prevent this optimization; volatiles prevent this optimization. Are those the only thing which prevents the optimization?

No. You can also use Interlocked operations, or you can introduce memory fences explicitly.

Do I understand enough of this to use volatile correctly?

Nope.

What should I do?

Don't write multithreaded programs in the first place. Multiple threads of control in one program is a bad idea.

If you must, don't share memory across threads. Use threads as low-cost processes, and only use them when you have an idle CPU that could do a CPU-intensive task. Use single threaded asynchrony for all I/O operations.

If you must share memory across threads, use the highest level programming construct available to you, not the lowest level. Use a CancellationToken to represent an operation being cancelled elsewhere in an asynchronous workflow.

edited Oct 01 '18 at 14:04

answered Oct 01 '18 at 14:00

Eric Lippert

647,829
179
1,238
2,067

1

The savior has come. Thanks a lot or your time! – NtFreX Oct 01 '18 at 14:05
So basicly only .NET Framework developers need to use `volatile`? And it's only needed to create the next higher level components in the framework as `Monitor` and so? – NtFreX Oct 01 '18 at 14:16
`Use single threaded asynchrony for all I/O operations.` Why is that? I thought windows could handle multithreaded file handling via async? https://learn.microsoft.com/en-us/windows/desktop/FileIO/synchronous-and-asynchronous-i-o – NtFreX Oct 01 '18 at 14:24
Or can't windows handle files multithreaded and you only get a performance benefit with the task coroutine/state machine/pattern? In theory modern data drives can physicaly be read on multiple places at the same time? – NtFreX Oct 01 '18 at 14:39
1

@NtFreX: Yes, my advice is to only use `volatile` to build a higher-level component. If you want lazy initialization, use `Lazy`. It uses volatile internally, and it was written by experts who know what it means. If you want cancellation, use a cancellation token. If you want waits, use a wait handle. Don't roll your own. – Eric Lippert Oct 01 '18 at 17:33
@NtFreX: What do you mean by "multithreaded file handling via async"? **Async IO does not create a new thread**. Read "There Is No Thread" by Stephen Cleary if that is in any way not clear to you. Async IO operates on a level *below* operating system threads; async IO works at the hardware interrupt level. – Eric Lippert Oct 01 '18 at 17:41
Yes thats clear. But I mean can a filesystem handle multiple requests at the same time? I know you can use async to enable other code to run while the task is doing its IO stuff (There is no thread) but can you have truly multithreaded access to the filesystem? Why do you recommend to use only one IO thread? – NtFreX Oct 01 '18 at 17:43
@NtFreX: Obviously the file system can handle multiple requests by *processes* at the same time, and each process has one or more threads. The file system will do what it needs to do to stay consistent. The reason to use single-threaded async IO for file access is because *threads are insanely expensive*, so don't waste one! Making a thread to do IO is like hiring a worker to sleep next to your mail box in case you get mail some day. That's a lot more expensive and difficult than simply checking your mailbox when you have a free moment. – Eric Lippert Oct 01 '18 at 17:53
@NtFreX: The ideal use for threads is to have *one thread per processor*, not more, and not less, and to have each processor pegged at 100% all the time doing work on the CPU. If you don't have any CPU work to do, let some *other* process have that CPU. That will extract the maximum performance from the machine. – Eric Lippert Oct 01 '18 at 17:55
Clear because of async you don't need a thread. So use only one thread when you don't need to write/read muliple files at the same time? If you need to write/read from multiple places one can use multiple threads? So you meant it as generall advise? – NtFreX Oct 01 '18 at 17:59
You can read and write as many files as you want at the same time on one thread. The same way that you don't have to hire twenty secretaries if you have twenty letters to send. – Eric Lippert Oct 01 '18 at 18:01
Thats why I meant it's valid then to use multiple threads. Is that correct? – NtFreX Oct 01 '18 at 18:01
Anyway thank you very much for your time! I really appreciate it! – NtFreX Oct 01 '18 at 18:03
1

No, you never want to use multiple threads to do file IO, that's what I'm saying. Think of file IO like sending a letter in the mail and then getting a response two weeks later. **In that scenario you don't need to hire anyone to send or receive your letters for you no matter how many letters you are sending and receiving**. You hiring someone to sit by your mailbox **does not make the postal service faster or more efficient**. – Eric Lippert Oct 01 '18 at 18:12
Ok I think I get it now. The device driver itself doesn't need multiple threads to handle parallism. Therefore You never need multpile threads for IO. How is the content of the hard drive transfered to the memory? Through the CPU? Is that so low level that you don't need a thread to do that? – NtFreX Oct 01 '18 at 18:22
Who does the work of the mail man? Does it involve the CPU? Is it done in a thread or is it more low level? It seems like I don't need to care? But I need to note to myself that I should **never** use multiple threads in IO. – NtFreX Oct 01 '18 at 18:28
1

IO is performed by *hardware*, which exists independently of the CPU and any operating-system level ideas like "threads". The CPU and the hardware talk to each other using interrupts. Comments on a SO question are not a good place to give you a tutorial about how hardware works; do some research and if you still have questions, **ask a new question**. – Eric Lippert Oct 01 '18 at 18:37
Sorry to have botherd you and thanks again. I would have payed you for this its that good. I'll improve my stackoverflow behavoir. – NtFreX Oct 01 '18 at 18:39

usr · Answer 2 · 2018-10-01T13:54:38.413

2

This question is explicitly about optimizations

That's not the right viewpoint. What matters is how the language is specified to behave. The JIT will only optimize under the constraint of not violating the specification. Optimization are invisible to the program therefore. The issue with the code in this question is not that it's being optimized. The issue is that nothing in the specification forces the program to be correct. In order to fix this you do not turn off optimizations or somehow communicate with the compiler. You use primitives that guarantee the behavior that you need.

You cannot lock _flag. lock is syntax for the Monitor class. That class locks based on an object on the heap. _flag is a bool which is not lockable.

To cancel a loop I'd use CancellationTokenSource for that these days. It uses volatile accesses internally but hides that from you. The loop polls the CancellationToken and cancelling the loop is done by calling CancellationTokenSource.Cancel(). That's very self-documenting and easy to implement.

You can also wrap any access to _flag in a lock. That would look like this:

object lockObj = new object(); //need any heap object to lock
...

while (true) {
 lock (lockObj) {
  if (_flag) break;
 }
 ...
}

...

lock (lockObj) _flag = true;

You can also use volatile. Eric Lippert is quite correct that it's best to not touch hard core threading stuff if you don't have to.

edited Oct 01 '18 at 13:54

answered Oct 01 '18 at 12:48

usr

168,620
35
240
369

`volatile` should never be used for thread safety, use [Interlocked operations](https://learn.microsoft.com/en-us/dotnet/api/system.threading.interlocked?view=netframework-4.7.2) if you need to change primitives across threads – Mgetz Oct 01 '18 at 12:51
@Mgetz I disagree. In .NET volatile is well defined and can be used. It corresponds to the `Volatile` class. Maybe you were reminded of C++ volatile which is unusable as I understand it. – usr Oct 01 '18 at 12:52
Given the inconsistent guarantees of `volatile` and the difficulty explaining how to use it properly... I would encourage using explicit atomic operations as provided by the library. – Mgetz Oct 01 '18 at 12:53
I suppose if you wrap _flag with a lock it won't do the mentioned optimization? – NtFreX Oct 01 '18 at 12:53
@Mgetz what do you mean by inconsistent? volatile is perfectly well defined and usable in practice. For example in this loop it would work just fine. Interlocked does something else that is not volatile. Interlocked operations perform some other computation or action as opposed to just a load or store. Not quite comparable. – usr Oct 01 '18 at 12:57
1

@NtFreX yes. The lock contains memory barriers that force the updates to become visible to other threads. – usr Oct 01 '18 at 12:58
@usr `volatile` per the spec does not actually guarantee sequential consistency data race free results. Only Interlocked operations and full locks do. An interlocked operation is often considered "Lock free" for primitives and guarantees all of that. – Mgetz Oct 01 '18 at 12:59
Does that mean error behavior in incorect coded multithreaded code is random? – NtFreX Oct 01 '18 at 13:01
@Mgetz we don't need seq consistency here. The store must release, the load must acquire. volatile does exactly that. I agree that it's best avoided. – usr Oct 01 '18 at 13:01
1

@NtFreX Error behavior depends on the very specific case. Sometimes it appears random, sometimes it's deterministically wrong, sometimes it's perfectly right until your pager goes off at 4AM. – usr Oct 01 '18 at 13:02

score 2 · Answer 3 · answered Oct 01 '18 at 13:39

2

When you use a lock or Interlocked Operations you're telling the compiler that block or memory location has data races and that it cannot make assumptions about access to those locations. Thus the compiler backs off on optimizations that it could otherwise perform in a data race free environment. This implied contract also means you're telling the compiler you will access those locations in an appropriate data-race-free way.

answered Oct 01 '18 at 13:39

Mgetz

5,108
2
33
51

I have made some comments under the question. lock and Interlocked do not communicate that data has data races. They request a very specific behavior. They are not a switch to disable optimizations. For example the JIT can still convert `volatileField = 1; volatileField = 2;` into `volatileField = 2;` and optimize it. – usr Oct 01 '18 at 13:56
@usr while I'm not mentioning that I'm aware. The requirement is the indication of data races to the compiler which is what the question is asking about. – Mgetz Oct 01 '18 at 13:58

score 0 · Answer 4 · answered Oct 08 '18 at 10:10

The C# volatile keyword implements the so called acquire and release semantics, thus it is totally legitimate to use it to do simple threads synchronization. And any standard compliant JIT engine should not optimize it away.

While, of course, it is a spurious language feature since that C/C++ have a different semantic, and that is the one most programmers might have already get used to. So the C# specific and Windows specific (except for ARM architecture) usage of "volatile" is sometimes confusing.

.NET JIT compiler volatile optimizations

4 Answers4