9

Considering the following example:

private int sharedState = 0;

private void FirstThread() {
    Volatile.Write(ref sharedState, 1);
}

private void SecondThread() {
    int sharedStateSnapshot = Volatile.Read(ref sharedState);
    Console.WriteLine(sharedStateSnapshot);
}

Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.

However, my understanding now is that:

  • Volatile.Write() emits a release fence. This means no preceding load or store (in program order) may happen after the assignment of 1 to sharedState.
  • Volatile.Read() emits an acquire fence. This means no subsequent load or store (in program order) may happen before the copying of sharedState to sharedStateSnapshot.

Or, to put it another way:

  • When sharedState is actually released to all processor cores, everything preceding that write will also be released, and,
  • When the value in the address sharedStateSnapshot is acquired; sharedState must have been already acquired.

If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.

If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)

Xenoprimate
  • 7,691
  • 15
  • 58
  • 95
  • This all depends on how you know that `FirstThread()` ran before `SecondThread()`. Is this knowledge based also on other code and the memory model, or is it just what you, the programmer, believe to be the case? – Damien_The_Unbeliever Mar 14 '14 at 11:50
  • @Damien_The_Unbeliever I'm not sure I understand your question... – Xenoprimate Mar 14 '14 at 11:52
  • Well, you say that "`FirstThread()` really did execute before `SecondThread()`" - but **how** do you know that? Is there some other code you've not shown us that, in some way that also relies upon the memory model, is able to detect that `FirstThread` has definitely completed before it will make the call to `SecondThread`? – Damien_The_Unbeliever Mar 14 '14 at 11:55

3 Answers3

6

Your understanding is correct, and it is true that you cannot ensure that the program will always print 1 using these techniques. To ensure your program will print 1, assuming thread 2 runs after thread one, you need two fences on each thread.

The easiest way to achieve that is using the lock keyword:

private int sharedState = 0;
private readonly object locker = new object();

private void FirstThread() 
{
    lock (locker)
    {
        sharedState = 1;
    }
}

private void SecondThread() 
{
    int sharedStateSnapshot;
    lock (locker)
    {
        sharedStateSnapshot = sharedState;
    }
    Console.WriteLine(sharedStateSnapshot);
}

I'd like to quote Eric Lippert:

Frankly, I discourage you from ever making a volatile field. Volatile fields are a sign that you are doing something downright crazy: you're attempting to read and write the same value on two different threads without putting a lock in place.

The same applies to calling Volatile.Read and Volatile.Write. In fact, they are even worse than volatile fields, since they require you to do manually what the volatile modifier does automatically.

Kris Vandermotten
  • 10,111
  • 38
  • 49
  • Kris - thanks for your answer. I understand Eric's disdain for volatile/Volatile (especially in light of my revelation here) - but I assume they still have a place when a lock (even if uncontended) is too slow? – Xenoprimate Mar 14 '14 at 12:11
  • @Motig They have a place as a mechanism to implement locks. Luckily you don't have to do that, it has been done for you. You try to avoid locking, assuming that uncontended locks are too slow (which they are not). At least they give you correct semantics. Do you want a correct program, or a faster incorrect one? – Kris Vandermotten Mar 14 '14 at 12:14
  • Actually, I disagree with `Volatile.Read`/`Write` being worse than `volatile`. Joe Duffy points out [several problems with `volatile`](http://joeduffyblog.com/2010/12/04/sayonara-volatile/), and one of them (number 3) is the annotation scheme. You annotate a field declaration with `volatile`, whereas you should be annotating the reads/writes themselves. This expresses your intent much more clearly, and improves the readability of your code. `Volatile.Read`/`Write` are superior in this respect. – dcastro Mar 14 '14 at 12:16
  • This is also why some use `Interlocked` as a means of writing/reading with acquire-release semantics. It puts the intent where it belongs. – dcastro Mar 14 '14 at 12:17
  • @dcastro I see your point about making the volatile reads and writes explicit, and even agree to it. My point though is first of all, that when marking a field `volatile`, you cannot forget to use `Volatile.Read` or `Volatile.Write`, the same semantic is guaranteed by the compiler. And of course there is the bigger point that to solve the OP's problem, two fences are needed. Locks give you that. And they are a lot easier than volatile semantics and manual calls to `Thread.MemoryBarrier()`. – Kris Vandermotten Mar 14 '14 at 12:23
  • 1
    Don't get me wrong, I completely agree with your post! I only disagreed with the very last statement. And on that topic, I think a much better way would be to do what C++ did with the atomic template type, which wraps your variable and makes every volatile access very explicit. It also makes sure you never forget that this variable is being shared. – dcastro Mar 14 '14 at 12:29
  • 1
    @dcastro And I agree with you again on the C++ atomic template type :-) – Kris Vandermotten Mar 14 '14 at 12:35
  • @Motig: Here's the thing about eliding locks for performance reasons. If the price of an uncontended lock is too high for you to pay then my advice would be *rearchitect the application so that you are no longer sharing any memory across threads*, because *you can't afford it*. – Eric Lippert Mar 14 '14 at 15:44
  • Regarding comments above about volatile reads. FYI, `Volatile.Read/Write` does create a full fence, unfortunately. Unfortunately because they *could* create half fences but now there is code out in the wild that *depends* on them creating full fences, so Microsoft is unwilling to change the implementation and possibly break those people. – Eric Lippert Mar 14 '14 at 15:45
  • @EricLippert Hi - it was actually your latest Coverity blog post that got me here in the first place; so thanks. :) – Xenoprimate Mar 14 '14 at 15:54
  • 1
    As for eliding locks unnecessarily, what about in highly-concurrent multi-reader/multi-consumer collections? I've used some Interlocked constructs and the occasional volatile write or read to update head/tail indices. Is that not an acceptable application? Also, re: full fences in Volatile class - I always thought they added the Volatile class because they made that mistake in Thread.VolatileRead/Write? – Xenoprimate Mar 14 '14 at 15:56
  • @Motig: Ah, you are right, I forgot that it was `Thread`, not `Volatile` that had that problem. – Eric Lippert Mar 14 '14 at 15:59
  • @Motig: As Kris pointed out in a comment, extremely low-level tools are primarily useful for writing slightly-higher-level abstractions; this of course means that one has to have an extremely solid grasp of the actual characteristics of the low-level tools. One could make the argument that a high-performance threadsafe collection is itself a fairly low-level abstraction and the implementation is likely to have strict performance requirements. If empirical evidence shows that locks are too slow, that would be one of the few cases where I'd consider low-lock techniques. – Eric Lippert Mar 14 '14 at 16:03
  • 1
    @Motig: But of course that is a for-advanced-players-only scenario; players that advanced are players who know more about threading than my meagre knowledge. I don't know enough about low-lock techniques to have confidence that I'd get it right. People see `volatile` in the language, they think they know how to use it, and they write broken programs in order to solve a performance problem they don't even have. That's the behaviour that I want to discourage here. – Eric Lippert Mar 14 '14 at 16:05
4

You're right, there's no guarantee that release stores will be immediately visible to all processors. Volatile.Read and Volatile.Write give you acquire/release semantics, but no immediacy guarantees.

The volatile modifier seems to do this though. The compiler will emit an OpCodes.Volatile IL instruction, and the jitter will tell the processor not to store the variable on any of its registers (see Hans Passant's answer).

But why do you need it to be immediate anyway? What if your SecondThread happens to run a couple of milliseconds sooner, before the values are actually wrote? Seeing as the scheduling is non-deterministic, the correctness of your program shouldn't depend on this "immediacy" anyway.

Community
  • 1
  • 1
dcastro
  • 66,540
  • 21
  • 145
  • 155
  • If `SecondThread` ran before `FirstThread` then fine - I would expect to see the output '0'. But, what's to stop the core that ran `FirstThread` never releasing the write to `sharedState`? I mean, I know that realistically, the write buffer will be flushed pretty soon after, but in theory? – Xenoprimate Mar 14 '14 at 12:02
  • To be precise the `volatile` modifier on a field will turn every assignment to it into a volatile write, and every read into a volatile read. This does not work through references though, so don't pass volatile fields by ref. – Kris Vandermotten Mar 14 '14 at 12:03
  • 1
    @Motig "what's to stop the core that ran FirstThread *never* releasing the write to sharedState?" It will have to be released. You just can't know whether it'll happen immediately or soon after. Worst case scenario, it'll be released when the next context switch happens. – dcastro Mar 14 '14 at 12:06
  • I wish I could accept your answer too, both were very useful to me; but I guess technically Kris answered my lingering question "how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1?". So I accepted his. – Xenoprimate Mar 14 '14 at 12:20
2

Until recently, I was under the impression that, as long as FirstThread() really did execute before SecondThread(), this program could not output anything but 1.

As you go on to explain yourself, this impression is wrong. Volatile.Read simply issues a read operation on its target followed by a memory barrier; the memory barrier prevents operation reordering on the processor executing the current thread but this does not help here because

  1. There are no operations to reorder (just the single read or write in each thread).
  2. The race condition across your threads means that even if the no-reorder guarantee applied across processors, it would simply mean that the order of operations which you cannot predict anyway would be preserved.

If my understanding is therefore correct, then there is nothing to prevent the acquisition of sharedState being 'stale', if the write in FirstThread() has not already been released.

That is correct. In essence you are using a tool designed to help with weak memory models against a possible problem caused by a race condition. The tool won't help you because that's not what it does.

If this is true, how can we actually ensure (assuming the weakest processor memory model, such as ARM or Alpha), that the program will always print 1? (Or have I made an error in my mental model somewhere?)

To stress once again: the memory model is not the problem here. To ensure that your program will always print 1 you need to do two things:

  1. Provide explicit thread synchronization that guarantees the write will happen before the read (in the simplest case, SecondThread can use a spin lock on a flag which FirstThread uses to signal it's done).
  2. Ensure that SecondThread will not read a stale value. You can do this trivially by marking sharedState as volatile -- while this keyword has deservedly gotten much flak, it was designed explicitly for such use cases.

So in the simplest case you could for example have:

private volatile int sharedState = 0;
private volatile bool spinLock = false;

private void FirstThread()
{
    sharedState = 1;
    // ensure lock is released after the shared state write!
    Volatile.Write(ref spinLock, true); 
}

private void SecondThread()
{
    SpinWait.SpinUntil(() => spinLock);
    Console.WriteLine(sharedState);
}

Assuming no other writes to the two fields, this program is guaranteed to output nothing other than 1.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • This approach works and is correct _in this simple case_. But in real life, there is probably more data to be shared than an int. I still believe locks are the way to go. – Kris Vandermotten Mar 14 '14 at 12:32
  • 1
    A hot loop like `while(!spinLock)` will hurt performance on hyper-threading architectures. You should use `SpinWait.SpinUntil(() => spinLock)`, which emits a special `PAUSE` instruction that tells the processor that you're busy waiting and not actually doing anything useful. – dcastro Mar 14 '14 at 12:34
  • Jon, am I to understand that the `volatile` keyword has some extra behaviour that the `Volatile` class's methods don't exhibit? Why do you use a Volatile.Write in addition to marking `spinLock` as volatile? – Xenoprimate Mar 14 '14 at 12:35
  • @Motig I believe the `Volatile.Write` to the spinlock is not required, because the field is marked `volatile`. – Kris Vandermotten Mar 14 '14 at 12:37
  • @KrisVandermotten: I picked the simplest working example I could think of to make my point. Regarding `lock`, I am kind of puzzled: you do realize that your own answer does not guarantee that the result 1 will be printed, right? Just start `SecondThread` before `FirstThread` and you are pretty much guaranteed to get 0. – Jon Mar 14 '14 at 12:37
  • @Jon Perhaps I should have been more clear in my question; but I wanted to know how to guarantee that '1' would be printed *assuming ThreadOne() executes entirely before ThreadTwo() starts* (and I believe Kris's answer fulfills that?) – Xenoprimate Mar 14 '14 at 12:39
  • @Jon As far as know, the lock keyword ensures a full barrier upon entry on the locked region, and a full barrier upon exit. That should guarantee it (if thread two runs after thread one). – Kris Vandermotten Mar 14 '14 at 12:39
  • @dcastro: Indeed. I 've never used `SpinWait` before; will amend the answer accordingly. – Jon Mar 14 '14 at 12:40
  • There is another issue with the solution is this answer: it works only once. To use it repeatedly, you need a mechanism to _safely_ set spinlock to false again. – Kris Vandermotten Mar 14 '14 at 12:42
  • @Motig: I interpreted your question differently. Yes, Kris's answer is guaranteed to work if the threads run in the desired order. – Jon Mar 14 '14 at 12:44
  • @KrisVandermotten: Yes, see above. Regarding multiple runs, that wasn't part of the picture. – Jon Mar 14 '14 at 12:45
  • @dcastro: Can you point me to a resource where I can learn more about the instructions emitted by `SpinWait`? Thanks. – Jon Mar 14 '14 at 12:48
  • 1
    @Motig: I didn't answer all your questions: *"am I to understand that the volatile keyword has some extra behaviour that the Volatile class's methods don't exhibit?"* Yes. AFAIK `volatile` ensures that the read won't be satisfied by a cached value; `Volatile.Read` ensures order of operations. Different things. *"Why do you use a Volatile.Write in addition to marking spinLock as volatile?"* To ensure that `spinLock = true` (releasing the second thread) cannot be observed before the store to `sharedState`. – Jon Mar 14 '14 at 12:54
  • 1
    @Motig: Be careful around `volatile`. It only does one thing: ensures that, *if writes have already been committed* to a memory location, subsequent reads on that location will be guaranteed to see the writes (i.e. eliminates the possibility of caching optimizations that could introduce bugs in the presence of writers unknown to the compiler). But it does that one thing very well. – Jon Mar 14 '14 at 12:56
  • 1
    @Jon One other thing - spinning is *entirely* useless in a single-CPU machine. `SpinWait` will simply yield in these cases, instead of spinning. There aren't many resources on this, but I found this comment on the [source code](http://reflector.webtropy.com/default.aspx/4@0/4@0/untmp/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/Threading/SpinWait@cs/1305376/SpinWait@cs): "*We do this using the CLR's SpinWait API, which is just a busy loop that issues YIELD/PAUSE instructions to ensure multi-threaded CPUs can react intelligently to avoid starving. (These are NOOPs on other CPUs.)*" – dcastro Mar 14 '14 at 13:24
  • You should also take a look at the [`PAUSE` instruction](http://stackoverflow.com/a/7086289/857807), and Joe Duffy's [blog post](http://msdn.microsoft.com/en-us/magazine/cc163427.aspx#S2) - scroll down to "Reusable Spin Wait". – dcastro Mar 14 '14 at 13:26