28

What is the difference between Interlocked.Exchange and Volatile.Write?

Both methods update value of some variable. Can someone summarize when to use each of them?

In particular I need to update double item of my array, and I want another thread to see the freshest value. What is preferred? Interlocked.Exchange(ref arr[3], myValue) or Volatile.Write(ref arr[3], info); where arr is declared as double?


Real example, I declare double array like that:

private double[] _cachedProduct;

In one thread I update it like that:

_cachedProduct[instrumentId] = calcValue;
//...
are.Set();

In another thread I read this array like that:

while(true)
{
    are.WaitOne();
    //...
    result += _cachedProduct[instrumentId];
    //...
}

For me it just works fine as is. However to make sure "it will always work" no matter what it seems I should add either Volatile.Write or Interlocked.Exchange. Because double update is not guaranteed to be atomic.

In the answer to this question I want to see detailed comparison of Volatile and Interlocked classes. Why we need 2 classes? Which one and when to use?


Another example, from the implementation of a locking mechanism in an in-production project:

private int _guard = 0;

public bool Acquire() => Interlocked.CompareExchange(ref _guard, 1, 0) == 0;

public void Release1() => Interlocked.Exchange(ref _guard, 0);
public void Release2() => Volatile.Write(ref _guard, 0);

Does it make any practical difference if the users of this API call the Release1 or the Release2 method?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Oleg Vazhnev
  • 23,239
  • 54
  • 171
  • 305
  • You can't use Volatile.Write in this specific case: it only accepts reference types, not value types. – Julien Lebosquain Sep 19 '12 at 12:08
  • 1
    You're using a wait handle, it already *guarantees* that the other thread sees the newest value completely. It *will* always work. – harold Sep 19 '12 at 12:15
  • Somewhat related: [Are reads and writes to a variable of type double guaranteed to be atomic on a 64 bit intel processor?](https://stackoverflow.com/questions/24448677/are-reads-and-writes-to-a-variable-of-type-double-guaranteed-to-be-atomic-on-a-6) – Theodor Zoulias Jan 26 '22 at 01:59
  • @TheodorZoulias: This bounty might have made more sense on a new question. As harold says, the example in this question already synchronizes with the writer through `are` with Set / WaitOne, so even a plain assignment would Just Work. But normally I'd expect Volatile.Write to be at least as cheap as `Interlocked.Exchange` (atomic RMW), and of course the store part of both is atomic, I think that's the point of Volatile.Write. It might also have ordering semantics, hopefully just release not seq_cst so it can be cheaper on ISAs like x86. – Peter Cordes Feb 04 '22 at 14:40
  • @PeterCordes I thought about asking a new question, but the title of that question would be identical with the title of this question, which doesn't make much sense. I could add the word "practical" in the existing title, but I don't think that Oleg Vazhnev (the poster of this question) was interested in a purely theoretical explanation either. We both want to know if using either of those two APIs makes any difference in practice, or we can just flip a coin and let the coin decide which API to use. – Theodor Zoulias Feb 04 '22 at 14:50
  • @PeterCordes you said that the `Volatile.Write` might be cheaper than the `Interlocked.Exchange`, in terms of CPU utilization. Is it possible that the `Volatile.Write` has some hidden disadvantage, like updating the memory "less instantaneously" (if this makes any sense) than the `Interlocked.Exchange`? – Theodor Zoulias Feb 04 '22 at 15:00
  • 1
    @TheodorZoulias: I wouldn't expect any difference. [Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees?](//stackoverflow.com/q/61591287) explains the common misconception that memory barriers might help with inter-core latency. (Interlocked.Exchange has to wait for the store buffer to drain before it can become visible, at least on x86. If it can compile for ARM like a c++ `memory_order_relaxed` exchange without ordering wrt. surrounding code (I'd be surprised; I thought Interlocked implied a barrier), it could be different – Peter Cordes Feb 04 '22 at 15:07
  • @TheodorZoulias: In MS C++, `InterlockedIncrement` compiles with a full memory barrier for ARM64, when it wouldn't need any just for atomicity of the operation itself. ([Found a comment thread where I'd tested this myself](https://stackoverflow.com/questions/1581718/does-interlocked-compareexchange-use-a-memory-barrier#comment110417484_1716587). I highly suspect the answer here is wrong unless C# `Interlocked.` stuff has weaker ordering guarantees than in C++. I suspect that Interlocked.Exchange actually guarantees a barrier, too. (In x86 asm, atomic RMW instructions are also barriers.) – Peter Cordes Feb 04 '22 at 15:13
  • 1
    @PeterCordes thanks for the links, very informative! Regarding `Interlocked` and fences, in [this](https://stackoverflow.com/questions/6581848/memory-barrier-generators/6585367#6585367) answer the `Interlocked` class methods are listed as mechanisms *"that are generally agreed upon to cause implicit barriers"*. The same is stated in Joseph Albahari's [online book](http://www.albahari.com/threading/part4.aspx#_Interlocked) "*The following implicitly generate full fences: [...] All methods on the `Interlocked` class*". I am sure that I have seen it somewhere in Microsoft's documentation as well. – Theodor Zoulias Feb 04 '22 at 15:31
  • @PeterCordes although the links are informative, following them is like going down a rabbit hole. Links upon links, full of unknown acronyms, all the way down. What I am really hoping is to get a simple answer like this (made up answer follows) *"The two APIs produce exactly the same behavior in some computers, while in other computers the `Volatile.Write` is slightly cheaper, but may take a dozen of nanoseconds before the written value is visible to other threads."* This would allow me to make an informed decision about which API to use (the decision would be: flip a coin, it doesn't matter). – Theodor Zoulias Feb 04 '22 at 15:57
  • 1
    @TheodorZoulias: The only possible advantage to Interlocked was the possibility that it had a weaker memory barrier so could be cheaper on some machines. That seems not to be the case. So for pure writes, always just use `Volatile.Write`; if the cheapest way for a compiler to implement its ordering semantics is x86 `xchg`, then so be it. (If it implies seq_cst ordering / a *full* barrier, not just release semantics). But if it's just `release`, then it's clearly better than Interlocked.Exchange on x86 and AArch64, no tradeoff. And due to the lack of a read, probably on other ISAs. – Peter Cordes Feb 04 '22 at 16:03
  • @PeterCordes so would you say that the `Interlocked.Exchange` is only useful when you want to get the previous value of the variable, via its return value, and in all other cases the `Volatile.Write` is preferable? If you consider this to be correct, you may post it as an answer and I'll award it the bounty. :-) – Theodor Zoulias Feb 04 '22 at 16:41
  • 1
    @TheodorZoulias: Yes, I think that's correct; if Interlocked.Exchange has to be a full barrier, there's no way a write could be more expensive except possibly bad implementation choices. But even that's unlikely if it only has release semantics, not a full barrier or like C++ seq_cst to create SC if the loads are also Volatile.Read. I don't particularly care about the rep, but if this is a question people wonder about, I guess it's useful to have that posted as an answer. Will do at some point, maybe once I find out how Volatile.Write actually compiles for x86(-64) and AArch64. – Peter Cordes Feb 04 '22 at 16:51
  • @PeterCordes yes, I am sure this is the kind of answer that most people coming to this question would be looking for. :-) – Theodor Zoulias Feb 04 '22 at 16:58
  • 1
    @TheodorZoulias: Ok, wrote something up and posted it. Could still benefit from some research if you or anyone is interested in checking how Volatile.Write actually compiles on x64 or ARM64, and maybe a proof-read when I'm more awake to see if I left any sentences half-finished. :P – Peter Cordes Feb 05 '22 at 21:29

2 Answers2

11

the Interlocked.Exchange uses a processor instruction that guarantees an atomic operation.

The Volatile.Write does the same but it also includes a memory barrier operation. I think Microsoft added Volatile.Write on DotNet 4.5 due to support of ARM processors on Windows 8. Intel and ARM processors differs on memory operation reordering.

On Intel, you have a guarantee that memory access operations will be done in the same order they are issued, or at least that a write operation won't be reordered.

From Intel® 64 and IA-32 Architectures Software Developer’s Manual, Chapter 8:

8.2.2 Memory Ordering in P6 and More Recent Processor Families The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium 4, and P6 family processors also use a processor-ordered memory-ordering model that can be further defined as “write ordered with store-buffer forwarding.” This model can be characterized as follows.

On ARM you don't have this kind of guarantee, so a memory barrier is required. An ARM blog explaining this can be found here: http://blogs.arm.com/software-enablement/594-memory-access-ordering-part-3-memory-access-ordering-in-the-arm-architecture/

In your example, as the operation with double is not guaranteed to be atomic, I would recommend a lock to access it. Remember that you have to use the lock on both parts of your code, when reading and setting the value.

A more complete example would be better to answer your question, as it is not clear what happens after these values are set. For a vector, if you have more readers than writers, consider the use of a ReaderWriterLockSlim object: http://msdn.microsoft.com/en-us/library/system.threading.readerwriterlockslim.aspx

The number of threads and the frequency of read/writes can change dramatically your locking strategy.

saucecontrol
  • 1,446
  • 15
  • 17
nmenezes
  • 910
  • 6
  • 12
  • i don't want to use lock, cause one x64 systems "double update" is likely atomatic and so I do not want to introduce extra latency which not required. this is very time sensitive code so I really want to win several extra microseconds. – Oleg Vazhnev Oct 02 '12 at 18:52
  • hm so `Volatile.Write` guarantees atomicity. msdn doesn't say anything about that. why you suggest to use `lock`? why not using `Interlock.Exchage` or `Volatile`? – Oleg Vazhnev Oct 02 '12 at 18:59
  • I used the description of the Volatile class to base my answer. A good article about the difference between atomic, volatile can be found here: http://blogs.msdn.com/b/ericlippert/archive/2011/05/26/atomicity-volatility-and-immutability-are-different-part-one.aspx?PageIndex=2 I checked the C# language definition 4.0. On section 5.5, they did not update the document to include double as an atomic operation. You asked a way to guarantee it would always work, so a lock is such a guarantee. In the specific case of your example, it will depends on which thread increments instrumentId – nmenezes Oct 03 '12 at 11:14
  • Remember that in .net you always depend on CLR and how Microsoft translates it before execution. So, a 64 bit machine can have an atomic double operation, as you said. But the language specification, written by Microsoft, explicitly says they do not guarantee it for doubles. – nmenezes Oct 03 '12 at 11:29
  • 1
    You say that `Volatile.Write` includes a (full?) memory barrier but `Interlocked.Exchange` doesn't. Everything I was able to dig up while researching my answer indicates that's backwards. – Peter Cordes Apr 28 '22 at 20:25
6

If you don't care about the old value, and don't need a full memory barrier (including an expensive StoreLoad, i.e. draining the store buffer before later loads), always use Volatile.Write.

Volatile.Write - atomic release store

Volatile.Write is a store with "release" semantics, which AArch64 can do cheaply, and which x86 can do for free (well, same cost as a non-atomic store, except of course for contention with other cores also trying to write the line). It's basically equivalent to C++ std::atomic<T> store(value, memory_order_release).

For example, in the case of a double, Volatile.Write for x86 (including 32-bit and x86-64) could compile to an SSE2 8-byte store directly from an XMM register, like movsd [mem], xmm0, because x86 stores already have as much ordering as MS's documentation specifies for Volatile.Write. And assuming the double is naturally-aligned (which any C# runtime would do, right?) it's also guaranteed to be atomic. (On all x86-64 CPUs, and 32-bit since P5 Pentium.)

The older Thread.VolatileWrite method in practice uses a full barrier, instead of just being a release operation that can reorder in one direction. That makes it no cheaper than Interlocked.Exchange, or not much on non-x86. But Volatile.Write/Read don't have that problem of an overly strong implementation that some software probably relies on. They don't have to drain the store buffer, just make sure all earlier stores (and loads) are visible by the time this one is.


Interlocked.Exchange - atomic RMW plus full barrier (at least acq/rel)

This is a wrapper for the x86 xchg instruction, which acts as if it had a lock prefix even if the machine code omits that. That means an atomic RMW, and a "full" barrier as part of it (like x86 mfence).

In general, I think the Interlocked class methods originated as wrappers for x86 instructions with the lock prefix; on x86 it's impossible to do an atomic RMW that isn't a full barrier. There are MS C++ functions with those names, too, so this history predates C#.

The current documentation for Interlocked methods (other than MemoryBarrier) on MS's site doesn't even bother to mention that these methods are a full barrier, even on non-x86 ISAs where atomic RMW operations don't require that.

I'm not sure if the full barrier is an implementation detail rather than part of the language spec, but it's certainly the case currently. That makes Intelocked.Exchange a poor choice for efficiency if you don't need that.

This answer quotes the ECMA-335 spec as saying that Interlocked operations perform implicit acquire/release operations. If that's like C++ acq_rel, that's fairly strong ordering since it's an atomic RMW with the load and store somewhat tied together, and each one prevents reordering in one direction. (But see For purposes of ordering, is atomic read-modify-write one operation or two? - it's possible to observe a seq_cst RMW reordering with a later relaxed operation on AArch64, within the limits allowed by C++ semantics. It's still an atomic RMW, though.)

@Theodor Zoulias found multiple sources online saying that C# Interlocked methods imply a full fence/barrier. For example, Joseph Albahari's online book: "The following implicitly generate full fences: [...] All methods on the Interlocked class". And on Stack Overflow, Memory barrier generators includes all Interlocked class methods in its list. Both of these may just be cataloguing actual current behaviour, rather than what's mandated by the language spec.

I'd assume there's plenty of code that now depends on it, and would break if Interlocked methods changed from being like C++ std::memory_order_seq_cst to relaxed like the MS docs imply by saying nothing about memory ordering wrt. to the surrounding code. (Unless that's covered somewhere else in the docs.)

I don't use C# myself so I can't easily cook up an example on SharpLab with JITted asm to check, but MSVC compiles its _InterlockedIncrement intrinsic to include a dmb ish for AArch64. (Comment thread.) So it seems MS compilers go beyond even the acquire/release guaranteed by the ECMA language spec and add a full barrier, if they do the same thing for C# code.

BTW, some people only use the term "atomic" at all to describe RMW operations, not atomic loads or atomic stores. MS's documentation says the Interlocked class "Provides atomic operations for variables that are shared by multiple threads." but the class doesn't provide pure stores or pure loads, which is weird.

(Except for Read([U]Int64), presumably intended to expose 32-bit x86 lock cmpxchg8b with desired=expected so you either replace a value with itself or load the old value. Either way it dirties the cache line (so contends with reads by other threads just like any other Interlocked RMW operation) and is a full barrier, so you wouldn't normally read a 64-bit integer this way in 32-bit asm. Modern 32-bit code can just use SSE2 movq xmm0, [mem] / movd eax, xmm0 / pextrd edx, xmm0, 1 or similar, like G++ and MSVC do for std::atomic<uint64_t>; this is much better and can scale to multiple threads reading the same value in parallel without contending with each other.)

(ISO C++ gets this right, where std::atomic<T> has load and store methods, as well as exchange, fetch_add, etc. But ISO C++ defines literally nothing about what happens with unsynchronized read+write or write+write of a plain non-atomic object. A memory-safe language like C# has to define more.)


Inter-thread latency

Is it possible that the Volatile.Write has some hidden disadvantage, like updating the memory "less instantaneously" (if this makes any sense) than the Interlocked.Exchange?

I wouldn't expect any difference. Extra memory ordering just makes later stuff in the current thread wait until after a store commits to L1d cache. It doesn't make that happen any sooner, since CPUs already do that as fast as they can. (To make room in the store buffer for later stores.) See Does hardware memory barrier make visibility of atomic operations faster in addition to providing necessary guarantees? for more.

Certainly not on x86; IDK if things could be any different on weakly-ordered ISAs where a relaxed atomic RMW could load+store without waiting for the store buffer to drain, and might "jump the queue". But Interlocked.Exchange doesn't do a relaxed RMW, it's more like C++ memory_order_seq_cst.


Examples in the question:

In the first example, with .Set() and .WaitOne() on a separate variable, that already provides sufficient synchronization that a plain non-atomic assignment to a double is guaranteed to be fully visible to that reader. Volatile.Write and Interlocked.Exchange would both be entirely pointless.

For releasing a lock, yes you just want a pure store, especially on x86 where that doesn't take any barrier instructions. If you want to detect double-unlocking (unlocking an already-unlocked lock), load the spinlock variable first, before storing. (That can possibly miss double-unlocks, unlike an atomic exchange, but should be sufficient to find buggy usages unless they always only happen with tight timing between both unlockers.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    Wow! Thanks Peter for the extensive answer. For me, as a software developer, the first sentence is all I really want to know. :-) Regarding the caveat about the `Thread.VolatileXXX` causing full fences, AFAIK the `Volatile.XXX` methods were introduced exactly to correct this errors in the order APIs. The new APIs don't produce full fences. – Theodor Zoulias Feb 05 '22 at 21:46
  • 1
    @TheodorZoulias: That's why I put it at the top :P The rest of the answer is there to back up the reasoning, and to answer the general question of what the difference is in general, for other use cases. – Peter Cordes Feb 05 '22 at 21:47
  • Hi Peter Cordes! I just noticed this remark in [the documentation](https://learn.microsoft.com/en-us/dotnet/api/system.threading.volatile#remarks) of the `Volatile` class: *"On a multiprocessor system, [...] a volatile write operation does not guarantee that the value written would be immediately visible to other processors."* Does this raises any concerns that the `Interlocked.Exchange` might be able to publish the new value faster than the `Volatile.Write`, on a multiprocessor system? – Theodor Zoulias Mar 30 '22 at 09:11
  • 1
    @TheodorZoulias: No, Interlocked.Exchange isn't *immediately* visible to other threads either. Nothing can be. The comment about `volatile` might be intended to remind you that it's not even *ordered* before later loads in the same thread, since it only has release semantics, not full-barrier or even seq_cst. – Peter Cordes Mar 30 '22 at 09:16
  • Thanks Peter! Btw Stephen Toub recently responded to [an API proposal](https://github.com/dotnet/runtime/issues/67014) of mine with a comment that I find concerning: *"Volatile doesn't really make any guarantees about how stale the data is, just about reordering of instructions around that access."* I might have to post a separate issue there and ask for a clarification, because this comment taken literally means that [the example](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/volatile#example) in the docs of the `volatile` keyword doesn't make any sense! – Theodor Zoulias Mar 30 '22 at 09:30
  • @TheodorZoulias: Sounds normal to me; the C# language standard doesn't give guarantees on inter-thread latency, just ordering. Just the best effort of the hardware, assuming a good-quality C# implementation. It's up to the hardware you choose to run your program on what the worst-case latency might be. (Assuming that the writer thread you care about gets to run at all; that's up to the OS's thread scheduler fairness and real-time guarantees or lack thereof). On modern fast CPUs, a worst-case worse than 1 microsecond probably doesn't happen by accident. – Peter Cordes Mar 30 '22 at 09:45
  • @TheodorZoulias: It doesn't generally make sense for language standards to be specific about performance, leaving that as a quality-of-implementation thing. (And often there's only one sane choice for how to do things, and then it's in hardware's hands, so any sane implementation (not a Deathstation 9000) should work about the same for `volatile`.) Even for hard-realtime use-cases, knowing how your code compiles on a specific C implementation is about the best you're going to get, or I assume C#, although managed code and GC is probably incompatible with hard realtime. – Peter Cordes Mar 30 '22 at 09:48
  • I posted a new issue on GitHub about my concerns [here](https://github.com/dotnet/runtime/issues/67330 "Confusion regarding Volatile.Read guarantees and example in the volatile docs"). I hope to get some clarifications about what "not guaranteed" means. Because taking the docs to the extreme means that I am not allowed to have any expectations about the behavior of the APIs that I use. In which case how am I supposed to use them? – Theodor Zoulias Mar 30 '22 at 11:08