How does this MSDN CompareExchange sample not need a volatile read?

Question

I was looking for a thread-safe counter implementation using Interlocked that supported incrementing by arbitrary values, and found this sample straight from the Interlocked.CompareExchange documentation (slightly changed for simplicity):

private int totalValue = 0;

public int AddToTotal(int addend)
{
    int initialValue, computedValue;
    do
    {
        // How can we get away with not using a volatile read of totalValue here?
        // Shouldn't we use CompareExchange(ref TotalValue, 0, 0)
        // or Thread.VolatileRead
        // or declare totalValue to be volatile?           
        initialValue = totalValue;

        computedValue = initialValue + addend;

    } while (initialValue != Interlocked.CompareExchange(
        ref totalValue, computedValue, initialValue));

    return computedValue;
}

 public int Total
 {
    // This looks *really* dodgy too, but isn't 
    // the target of my question.
    get { return totalValue; }
 }

I get what this code is trying to do, but I'm not sure how it can get away with not using a volatile read of the shared variable when assigning to the temporary variable that is added to.

Is there a chance that initialValue will hold a stale value throughout the loop, making the function never return? Or does the memory-barrier (?) in CompareExchange eliminate any such possibility? Any insight would be appreciated.

EDIT: I should clarify that I understand that if CompareExchange caused the subsequent read of totalValue to be up to date as of the last CompareExchange call, then this code would be fine. But is that guaranteed?

Yes, this could burn core for a while on a processor with a weak memory model. It wasn't stated that this is *efficient* code, merely that it is *correct* code. It is guaranteed to exit the loop, eventually, the thread scheduler has memory barriers. — Hans Passant, Sep 26 '11 at 16:00
@HansPassant After all this time, I've come to the conclusion that the code is not technically correct after all. The worry is that, because of an introduced read, the last argument to CompareExchange could contain an inconsistent value. I've edited my answer to explain. — relatively_random, Oct 26 '20 at 08:29

relatively_random · Answer 1 · 2020-10-26T08:16:20.840

Edit:

Someone gave me an upvote after all this time so I re-read the question and the answer and noticed a problem.

I either didn't know about introduced reads or it hasn't crossed my mind. Assuming Interlocked.CompareExchange doesn't introduce any barriers (since it's not documented anywhere), the compiler is allowed to transform your AddToTotal method into the following broken version, where the last two arguments to Interlocked.CompareExchange could see different totalValue values!

public int AddToTotal(int addend)
{
    int initialValue;
    do
    {        
        initialValue = totalValue;
    } while (initialValue != Interlocked.CompareExchange(
        ref totalValue, totalValue + addend, totalValue));

    return initialValue + addend;
}

For this reason, you can use Volatile.Read. On x86, Volatile.Read is just a standard read anyway (it just prevents compiler reorderings) so there's no reason not to do it. Then the worst that the compiler should be able to do is:

public int AddToTotal(int addend)
{
    int initialValue;
    do
    {
        initialValue = Volatile.Read (ref totalValue);
    } while (initialValue != Interlocked.CompareExchange(
        ref totalValue, initialValue + addend, initialValue));

    return initialValue + addend;
}

Unfortunately, Eric Lippert once claimed volatile read doesn't guarantee protection against introduced reads. I seriously hope he's wrong because that would mean lots of low-lock code is almost impossible to write correctly in C#. He himself did mention somewhere that he doesn't consider himself an expert on low-level synchronization so I just assume his statement was incorrect and hope for the best.

Original answer:

Contrary to popular misconception, acquire/release semantics don't ensure a new value gets grabbed from the shared memory, they only affect the order of other memory operations around the one with acquire/release semantics. Every memory access must be at least as recent as the last acquire read and at most as stale as the next release write. (Similar for memory barriers.)

In this code, you only have a single shared variable to worry about: totalValue. The fact that CompareExchange is an atomic RMW operation is enough to ensure that the variable it operates on will get updated. This is because atomic RMW operations must ensure all processors agree on what the most recent value of the variable is.

Regarding the other Total property you mentioned, whether it's correct or not depends on what is required of it. Some points:

int is guaranteed to be atomic, so you will always get a valid value (in this sense, the code you've shown could be viewed as "correct", if nothing but some valid, possibly stale value is required)
if reading without acquire semantics (Volatile.Read or a read of volatile int) means that all memory operations written after it may actually happen before (reads operating on older values and writes becoming visible to other processors before they should)
if not using an atomic RMW operation to read (like Interlocked.CompareExchange(ref x, 0, 0)), a value received may not be what some other processors see as the most recent value
if both the freshest value and ordering in regards to other memory operations is required, Interlocked.CompareExchange should work (the underlying WinAPI's InterlockedCompareExchange uses a full barrier, not so sure about C# or .Net specifications) but if you wish to be sure, you could add an explicit memory barrier after the read

Quick note about adding an explicit barrier after Interlocked.CompareExchange. Almost all of the performance impact of memory barriers is processor having to wait for the shared memory to finish operations. So adding a memory barrier just after (or before) another memory barrier should have a low performance impact -- the first barrier already did most of the work and the second one has no memory operations to wait for. — relatively_random, Oct 26 '20 at 07:13
Thumbs up for the edited answer and the original. It's nice to see someone who really understands low-lock concurrent coding. Is the following statement in the answer equivalent to saying that the native CompareExchange operation has a memory barrier: "This is because atomic RMW operations must ensure all processors agree on what the most recent value of the variable is." It looks from [here](https://stackoverflow.com/questions/65568185/for-purposes-of-ordering-is-atomic-read-modify-write-one-operation-or-two) like that might not be true for recent ARM processors. — radfast, Apr 08 '22 at 12:14
@radfast It's not equivalent, no. In order for atomic RMW operations to be correct, they only need to make sure that processors agree on what's the current state of that one particular location they're operating on. They don't have to care about the order of other reads and writes, which is what memory barriers and acquire/release semantics are all about. Some CPUs may associate some barriers or semantics with some atomic RMW instructions, some may not. — relatively_random, Apr 08 '22 at 13:11

Jon Skeet · Answer 2 · 2011-09-26T15:41:07.237

3

If we read a stale value, then the CompareExchange won't perform the exchange - we're basically saying, "Only do the operation if the value really is the one we've based our calculation on." So long as at some point we get the right value, it's fine. It would be a problem if we kept reading the same stale value forever, so CompareExchange never passed the check, but I strongly suspect that the CompareExchange memory barriers mean that at least after the time through the loop, we'll read an up-to-date value. The worst that could happen would be cycling forever though - the important point is that we can't possibly update the variable in an incorrect way.

(And yes, I think you're right that the Total property is dodgy.)

EDIT: To put it another way:

CompareExchange(ref totalValue, computedValue, initialValue)

means: "If the current state really was initialValue, then my calculations are valid and you should set it to computedValue."

The current state could be wrong for at least two reasons:

The initialValue = totalValue; assignment used a stale read with a different old value
Something changed totalValue after that assignment

We don't need to handle those situations differently at all - so it's fine to do a "cheap" read so long as at some point we'll starting seeing up-to-date values... and I believe the memory barriers involved in CompareExchange will ensure that as we loop round, the stale value we see is only ever as stale as the previous CompareExchange call.

EDIT: To clarify, I think the sample is correct if and only if CompareExchange constitutes a memory barrier with respect to totalValue. If it doesn't - if we can still read arbitrarily-old values of totalValue when we keep going round the loop - then the code is indeed broken, and may never terminate.

edited Sep 26 '11 at 15:41

answered Sep 26 '11 at 14:40

Jon Skeet

1,421,763
867
9,128
9,194

I get the idea behind the `do {...} while((initialValue != Interlocked.CompareExchange)` pattern, my question is only about the `initialValue = totalValue` statement. – Ani Sep 26 '11 at 14:41
1

@Ani: But they're linked together. The CompareExchange will only take action if that `initialValue = totalValue;` assignment was "correct" *and* if the value hasn't changed since. Will edit... – Jon Skeet Sep 26 '11 at 14:43
I'm aware of that, but my question is about why we don't require a volatile read of `totalValue`. – Ani Sep 26 '11 at 14:45
@Seb: I believe my question is accurate. @Jon: Let me put it this way: if the code were `initialValue = Interlocked.CompareExchange(ref totalValue, 0, 0);`, I wouldn't have asked this question. – Ani Sep 26 '11 at 14:46
@Ani: See my edit for why it's unimportant. Hopefully that'll make it clearer. – Jon Skeet Sep 26 '11 at 14:46
I appreciate your effort, but I really don't think you have understood my question. – Ani Sep 26 '11 at 14:47
I understand the idea of "if the value I used is still valid, then we're all good, otherwise let's try again". I understand why this code will not cause any miscomputation. But I don't get how the "cheap read" is *guaranteed* to become up to date at some point. If it doesn't, this *could* go into an infinite loop, yes? – Ani Sep 26 '11 at 14:52
1

I'm not certain that `CompareExchange` guarantees to generate a memory barrier on all platforms; it just does whatever it needs to do to get that up-to-date read. The behaviour described by Jon's edit is certainly what I'd expect on x86, but I suppose that it's possible for some other platform(s) to implement `CompareExchange` in some way that doesn't freshen the subsequent reads, in which case an infinite loop could be possible. – LukeH Sep 26 '11 at 14:52
@Ani: Whereas I really think I have, and the last paragraph will answer it. That read itself isn't volatile, and may be as stale as we like for the first time round the loop, and thereafter only as stale as the previous `CompareExchange` call. – Jon Skeet Sep 26 '11 at 14:53
1

@LukeH: Agreed. Basically the code is valid if and only if my assumption about `CompareExchange` is correct. – Jon Skeet Sep 26 '11 at 14:53
1

@Jon: So you're saying Interlocked.CompareExchange will 'freshen' *subsequent* 'cheap' reads of the field on the same thread up to (at worst) the point of the CompareExchange call? How can we make this assumption? – Ani Sep 26 '11 at 14:58
1

@Ani, I don't know the answer to your question, but I think *this conversation* is going around in circles indefinitely. :-) Here's what I would suggest: propose a sequence of reads and writes that has the bad behaviour you are concerned about. Then analyze whether that sequence of reads and writes is possible given the .NET memory model. More generally: if you are concerned (and you have every reason to be concerned about low-lock code!) then use a lock. – Eric Lippert Sep 26 '11 at 15:07
@Ani: Yes - I believe that is the case, but it's *not* clearly documented, either way. – Jon Skeet Sep 26 '11 at 15:41
@EricLippert: The problem is that I *suspect* it depends on whether `CompareExchange` constitutes a volatile read, and that's not clearly documented. I suspect it's one of those cases which is always practically okay, but not actually guaranteed. It would help if the .NET memory model were more explicitly documented. It *used* to be the case that the .NET 2+ memory model's best documentation was Joe Duffy's blog :) (The .NET memory model being stronger than the ECMA 335 one.) – Jon Skeet Sep 26 '11 at 15:43
I would be shocked if the `Interlocked.CompareExchange` function didn't produce a full-fence barrier. The MSDN documentation states that the Win32 API calls produce a full barrier on all platforms and if Jakob's assertion that the BCL call maps directly to the API call is true then it would be an open and shut case. http://msdn.microsoft.com/en-us/library/ms684122%28VS.85%29.aspx – Brian Gideon Sep 26 '11 at 16:32
@BrianGideon: Yup - it would be nice if this were actually *documented* for the .NET calls though. – Jon Skeet Sep 26 '11 at 16:34
1

@JonSkeet: It would certainly be nice if it were documented, but the compiler certainly has to generate an instruction to re-read `totalValue` (instead of using a value cached in a register) since it's passed by `ref` to `CompareExchange`. Whether or not .net documents that `CompareExchange` would always perform a full memory barrier, I would think a processor vendor would have to go absurdly far out of its way to have `CompareExchange` read a "fresh" value from main RAM but leave a stale one in a core's cache. – supercat Jun 12 '12 at 21:00
Sorry to resurrect an old topic, but are acquire semantics really necessary for this code to work? Acquire enforces ordering in respect to **other** memory operations after the read. The OP's code doesn't access any other memory address, just the one being CompareExchanged. Even with relaxed semantics, could an atomic RMW operation really do its job without grabbing a newer value? – relatively_random Dec 05 '18 at 07:57
This is in line with how .NET Core's LazyInitializer creates the locker object. It just does a CompareExchange(ref locker, new object(), null) once, doesn't use the return value at all and just uses the locker variable after if. – relatively_random Dec 05 '18 at 08:02
@relatively_random: I'm afraid I don't have the time to review this really, really carefully right now - which means any other feedback I'd have wouldn't be terribly useful. I would note that there's a difference between "Make sure a variable is non-null" and "Make sure it has the right integer value, when there are lots of different values available". I haven't assessed whether that's relevant to your question though. – Jon Skeet Dec 05 '18 at 09:17
Sorry, I meant that as a rhetorical question, not to waste your time by requesting an answer. I should be more careful about written communication, I was trying to just state this does work. And the mentioned initialization code isn't just about checking for non-null, it's about using the variable as a guaranteed valid value after the RMW. – relatively_random Dec 05 '18 at 09:26
@relatively_random: Sorry, I in turn expressed myself badly. The transition from "null" to "non-null" needs to happen exactly once, and then all threads need to read the same value. I think that's somewhat different from the update in the question, to the result of an addition where the value can change multiple times. – Jon Skeet Dec 05 '18 at 11:08
Yes, but the fact is that LazyInitializer assumes that the value in the variable is correctly updated after just one call to CompareExchange. This is evidence that the RMW forces a new read from shared memory. This property is all that the OP's loop needs to terminate: a fresh read. Memory barriers or acquire/release semantics are not required. – relatively_random Dec 05 '18 at 11:31

score 2 · Answer 3 · answered Sep 26 '11 at 15:28

2

The managed Interlocked.CompareExchange maps directly to the InterlockedCompareExchange in the Win32 API (there is also a 64 bit version).

As you can see in the function signatures, the native API requires the destination to be volatile and, even though it is not required by the managed API, using volatile is recommended by Joe Duffy in his excellent book Concurrent Programming on Windows.

answered Sep 26 '11 at 15:28

Jakob Christensen

14,826
2
51
81

3

It's possibly worth noting that volatile in native code and volatile in C# have fairly different meanings. – Jon Skeet Sep 26 '11 at 16:35

How does this MSDN CompareExchange sample not need a volatile read?

3 Answers3