Is the popular "volatile polled flag" pattern broken?

Question

Suppose that I want to use a boolean status flag for cooperative cancellation between threads. (I realize that one should preferably use CancellationTokenSource instead; that is not the point of this question.)

private volatile bool _stopping;

public void Start()
{
    var thread = new Thread(() =>
    {
        while (!_stopping)
        {
            // Do computation lasting around 10 seconds.
        }
    });

    thread.Start();
}

public void Stop()
{
    _stopping = true;
}

Question: If I call Start() at 0s and Stop() at 3s on another thread, is the loop guaranteed to exit at the end of the current iteration at around 10s?

The overwhelming majority of sources I've seen indicate that the above should work as expected; see: MSDN; Jon Skeet; Brian Gideon; Marc Gravell; Remus Rusanu.

However, volatile only generates an acquire-fence on reads and a release-fence on writes:

A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence. (C# Specification)

Therefore, there is no guarantee that a volatile write and a volatile read will not (appear to) be swapped, as observed by Joseph Albahari. Consequently, it is possible that the background thread would keep reading the stale value of _stopping (namely, false) after the end of the current iteration. Concretely, if I call Start() at 0s and Stop() at 3s, it is possible that the background task will not terminate at 10s as expected, but at 20s, or 30s, or never at all.

Based on acquire and release semantics, there are two issues here. First, the volatile read would be constrained to refresh the field from memory (abstractly speaking) not at the end of the current iteration, but at the end of the subsequent one, since the acquire-fence occurs after the read itself. Second, more critically, there is nothing to force the volatile write to ever commit the value to memory, so there is no guarantee that the loop will ever terminate at all.

Consider the following sequence flow:

Time   |     Thread 1                     |     Thread 2
       |                                  |
 0     |     Start() called:              |        read value of _stopping
       |                                  | <----- acquire-fence ------------
 1     |                                  |     
 2     |                                  |             
 3     |     Stop() called:               |             ↑
       | ------ release-fence ----------> |             ↑
       |        set _stopping to true     |             ↑
 4     |             ↓                    |             ↑
 5     |             ↓                    |             ↑
 6     |             ↓                    |             ↑
 7     |             ↓                    |             ↑
 8     |             ↓                    |             ↑
 9     |             ↓                    |             ↑
 10    |             ↓                    |        read value of _stopping
       |             ↓                    | <----- acquire-fence ------------
 11    |             ↓                    |    
 12    |             ↓                    |             
 13    |             ↓                    |             ↑
 14    |             ↓                    |             ↑
 15    |             ↓                    |             ↑
 16    |             ↓                    |             ↑
 17    |             ↓                    |             ↑
 18    |             ↓                    |             ↑
 19    |             ↓                    |             ↑
 20    |                                  |        read value of _stopping
       |                                  | <----- acquire-fence ------------

The most important parts are the memory fences, marked with --> and <--, which represent the thread synchronization points. The volatile read of _stopping can only (appear to) be moved up to its thread's previous acquire-fence at most. However, the volatile write can (appear to) be moved down indefinitely, since there is no other release-fence following it on its thread. In other words, there is no “synchronizes-with” (“happens-before”, “is-visible-to”) relation between the write to _stopping and any of its reads.

P.S. I am aware that MSDN gives very strong guarantees on the volatile keyword. However, the expert consensus is that MSDN is incorrect (and not backed up by the ECMA spec):

The MSDN documentation states that use of the volatile keyword “ensures that the most up-to-date value is present in the field at all times”. This is incorrect, since as we’ve seen [in the previous example], a write followed by a read can be reordered. (Joseph Albahari)

As if I weren't already plenty afraid of writing threaded code! — adv12, May 22 '17 at 20:49
The MSDN documentation cannot possibly be correct if it says that, because the notion that there is such a thing as "an up-to-date value" is simply false. Variable reads and writes, even volatile ones, are explicitly NOT guaranteed to be observed in a consistent ordering across threads, by the C# specification. — Eric Lippert, May 22 '17 at 21:04
@xxbbcc: I'm fine with the loop stopping after 10s. However, I'm arguing that it might actually take up to 20s (*two* iterations) for the loop to stop after the flag is set. — Douglas, May 22 '17 at 21:16
@Douglas Thank you - I removed my comment in the meantime because I realized that you already had the answer in your question - I had to read it again. :) You may want to go through this: https://stackoverflow.com/questions/21652938/where-to-places-fences-memory-barriers-to-guarantee-a-fresh-read-committed-write — xxbbcc, May 22 '17 at 21:24
Anecdotally, your time intervals are big enough that this really shouldn't be an issue. If you use memory barriers before the read / write, I think you should be fine. The key word being 'anecdotally', here. :) — xxbbcc, May 22 '17 at 21:26
@xxbbcc: Thanks a lot for that link; it's the most relevant discussion to my question that I've read so far (and it seems to confirm my suspicion). — Douglas, May 22 '17 at 21:29
@xxbbcc: You're right; it wouldn't hurt my performance to manually insert a memory barrier before the read and after the write, or to use thread synchronization constructs that do so implicitly, such as `Interlocked` or `lock`. However, that would mean that the `volatile` keyword is unsuitable for these scenarios (where one doesn't want to risk running an extra iteration). — Douglas, May 22 '17 at 21:32
@Douglas I believe you're correct. In your specific situation (time scales on the order of seconds) I'd just use a `lock` around the flag variable and be done with it. If necessary, you can do a few more checks for it inside the loop body to break out sooner. — xxbbcc, May 22 '17 at 21:34
Related: [1](https://stackoverflow.com/q/19382705/213550) and [2](https://stackoverflow.com/q/40938752/213550) I think that `volatile` is too specific even for c# — VMAtm, May 23 '17 at 02:02
Sadly my oft-linked-to article about reorderings that can happen on volatile reads and writes has been removed from the internet by my previous employer. I'll see if I can recreate the content at some point. The short version is: volatile reads can be reordered with respect to volatile writes even on strong memory model architectures like x86, and this can be *very confusing*. — Eric Lippert, May 23 '17 at 17:54
@VMAtm: Thanks for the links. There is one pattern – “[Publication via Volatile Field](https://msdn.microsoft.com/magazine/jj863136)” (scroll down) – where volatile works really well. But it appears to be widely misused in many other scenarios. — Douglas, May 23 '17 at 18:17
@EricLippert I knew it! That's why I can't find it. Really bad news about your article(s), they were great. — VMAtm, May 23 '17 at 18:27
@EricLippert: I had come across your [Atomicity, volatility and immutability are different, part three](https://blogs.msdn.microsoft.com/ericlippert/2011/06/16/atomicity-volatility-and-immutability-are-different-part-three/) article, which provides a great informal introduction to memory barriers, but doesn't go into detail on half-fences, so I assume you have something else in mind. [Albahari](http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword) also provides a good discussion of volatile write–read reorderings, citing Joe Duffy's example. — Douglas, May 23 '17 at 18:33
@EricLippert: I think I found it... was this it? [Reordering optimizations](https://web.archive.org/web/20160729162225/http://blog.coverity.com/2014/03/26/reordering-optimizations/#.WSSBrevyvIU) — Douglas, May 23 '17 at 18:42
Incidentally, the topic of the article that it builds upon, [Can I skip the lock when reading an integer?](https://web.archive.org/web/20161018033845/http://blog.coverity.com:80/2014/03/12/can-skip-lock-reading-integer/#.WSSCF-vyvIU), also happened to be my motivation for pursuing this question. There seems to be some consensus that volatile can be used for [reading an int that's updated by Interlocked on other threads](https://stackoverflow.com/q/24808291/1149773); however, my conclusion is that this is also incorrect. — Douglas, May 23 '17 at 18:49
@Douglas: That's the one, thanks! Glad I won't have to reconstitute it from memory. As for your question: I long ago stopped thinking that I had the faintest idea what is or is not possible for volatile access to variables. I try to not share memory between threads; when I must, I take out full locks. — Eric Lippert, May 23 '17 at 19:57
Knowledge as useful as that should never be lost. Fair enough; avoiding volatile seems to be the sensible consensus, although I am curious whether a pattern as popular as the above is indeed broken. — Douglas, May 24 '17 at 17:46
Are you looking for purely abstract answers in the sense of "do the semantics formally guarantee [something]"? (it seems you have shown that they do not) The pattern works for practical reasons. — harold, Jun 14 '17 at 19:40
@harold: Yes, I am looking for guarantees based on the formal definition of the memory model in the specification. The pattern has worked so far because most deployments have been constrained to .NET Framework running on x86 / x86-64, which offers a stronger memory model than what is mandated by the spec. It may easily break once platforms with weaker models start to gain traction (e.g. .NET Core on ARM). — Douglas, Jun 14 '17 at 19:53
The fences are irrelevant here, as it is enough to have opaque reads & writes, to make the example work. The fences only affect the relationship to other memory accesses or side effects. And in fact, the acquire fence only has to be in effect, if a new value has been read. Still, there is no guaranty that the loop will terminate *with the next iteration*, just like there is no guaranty that the two threads will run truly parallel on different cores. I wouldn’t consider this pattern broken, but it’s more of a “best-effort” strategy, rather than having an iron-hard guaranty. — Holger, Jun 04 '19 at 07:22

Tsyvarev · Answer 1 · 2017-06-16T13:48:00.783

1

If I call Start() at 0s and Stop() at 3s on another thread, is the loop guaranteed to exit at the end of the current iteration at around 10s?

Yes, 7 seconds is definitely sufficient for one thread to percieve change of _stopping variable.

Almost-formal explanations

For every variable which provides any type of visibility barrier (memory order), specification for any language should provide a garantee that:

Any change of the variable (with special memory order) from one thread will be observed in other threads during finit and bounded period of time.

Without this garantee, even memory order features of the variable are useless.

Specification for C# definitely provides such garantee about volatile variable, but I cannot find corresponded text.

Note, that such garantee about finit time is unrelated to memory orders garantees ("acquire", "release" and so on), and it cannot be deduced from the definitions of barriers and memory orders.

Formal-informal explanations

When say

I call Stop() at 3s

one implies, that there was some visible effect (e.g., information printed into the terminal), which allows him to claim about 3s timestamp (because print statement has been issued after the Stop()).

With that C# spec plays gracefully ("10.10 Execution order"):

Execution shall proceed such that the side effects of each executing thread are preserved at critical execution points. A side effect is defined as a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception. The critical execution points at which the order of these side effects shall be preserved are references to volatile fields (§17.4.3), lock statements (§15.12), and thread creation and termination.

Assuming printing is a critical execution point (likely it uses locks), you may be confident that at the moment assignment to _stopping volatile variable as a side effect is visible to the other thread, which checks given variable.

Informal explanations

While a compiler is allowed to move assignment of volatile variable forward in the code, it cannot do that indefinitely:

the assignment cannot be moved after the function call, because the compiler cannot assume anything about the function's body.
If assignment is performed inside a cycle, it should be completed before another assigment in the next cycle.
while one can imagine code with 1000 consecutive simple assignments (to other variables), so volatile assignment could be deffered for 1000 instructions, the compiler simply does perform such deffering. And even if it does, execution of 1000 simple instructions on modern CPU takes no more than several microseconds.

From the side of a CPU, situation is simpler: none CPU will deffer assignment to memory cell more than limited number of instructions.

In total, assignment to volatile variable can be deffered only on very limited number of instructions.

edited Jun 16 '17 at 13:48

answered Jun 16 '17 at 10:38

Tsyvarev

60,011
17
110
153

You're probably thinking of section 3.10: "Execution of a C# program proceeds such that the side effects of each executing thread are preserved at critical execution points. A side effect is defined as a read or write of a volatile field [...]. The critical execution points at which the order of these side effects must be preserved are references to volatile fields [...] The execution environment need not evaluate part of an expression if it can deduce that that expression’s value is not used and that no needed side effects are produced." – Jeroen Mostert Jun 16 '17 at 11:55
The latter, in particular, implies that expressions that *do* produce needed side effects must be evaluated, implying that the scenario where updating the value, and then reading the updated value, cannot be postponed indefinitely and the loop must terminate (disregarding the discussion of just how long it could be postponed for the moment). – Jeroen Mostert Jun 16 '17 at 11:57
@JeroenMostert: Good spot! Actually, I read exactly this section (3.10) when was writing the answer. But ... I find it as only describing the *memory order semantic*, and no *time bounding* property. The section tells about what happens, when reading or writing of volatiale variable is **actually performed** (by CPU). But it doesn't specify **how long** compiler/CPU may **defer** such reading/writing. – Tsyvarev Jun 16 '17 at 12:22
It does not, because that would be wildly impractical. No language specification I know of attempts something like that (unless it would be for a real-time system). But there is still a theoretical difference between *does not finish in the lifetime of the universe* and *does not ever finish*. I'm arguing that 3.10 prohibits the latter, and a compiler/platform that provably postpones indefinitely is in violation of the spec. This has very little practical impact if your platform postponed it for a thousand years, of course, but that's beside the point. The question was about formal semantics. – Jeroen Mostert Jun 16 '17 at 12:35
Note that, of course, the spec says nothing about actually making *any* kind of progress (volatile or not) but that's not really relevant since, by that token, we could declare any loop to be potentially infinite if the platform simply chose to freeze the thread executing it forever. If we make the basic, reasonable assumption that threads that can make progress eventually do (weak fairness), the loop should eventually terminate as well, because the volatile write must eventually happen in one thread, and must eventually be observed by the other. – Jeroen Mostert Jun 16 '17 at 12:41
1

As for other language specs, C++11 about atomics (in C++ volataile has different semantic) explicitely says: `An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.` – Tsyvarev Jun 16 '17 at 12:54
Yes, but that only says *finite*, not *how long*, which was what I meant with "no spec I know of attempts something like that". In general, the C++ and C specifications are far, far more precise than the C# spec, so that a section like this is missing from the C# spec doesn't say too much about the intended semantics. If the C# spec was written to that level, this question would probably not exist. :-) – Jeroen Mostert Jun 16 '17 at 12:57
`If the C# spec was written to that level, this question would probably not exist.` - Such questions exist for C++ too :) May be, because its spec is over-complicated. – Tsyvarev Jun 16 '17 at 13:27
@JeroenMostert: I have updated the answer with `Formal-informal explanations` part, which is based on the spec section you noted. – Tsyvarev Jun 16 '17 at 13:50

Is the popular "volatile polled flag" pattern broken?

1 Answers1

Almost-formal explanations

Formal-informal explanations

Informal explanations

Linked