Is double read atomic on an Intel architecture?

Question

My colleague and I are having an argument on atomicity of reading a double on an Intel architecture using C# .NET 4.0. He is arguing that we should use Interlocked.Exchange method for writing into a double, but just reading the double value (in some other thread) is guaranteed to be atomic. My argument is that .NET doesn't guarantee this atomicity, but his argument is that on an Intel architecture this is guaranteed (maybe not on AMD, SPARC, etc.).

Any Intel and .NET experts share some lights on this?

Reader is OK to read a stale (previous) value, but not incorrect value (partial read before and after write giving a garbage value).

Ask your colleague to cite where he read it? AFAIK in 32 bit machines there is no way to get atomicity for `double`(which is 64 bits in size). — Sriram Sakthivel, Jul 14 '14 at 07:52
Depends what you mean by atomic. Between issuing the read instruction and the instruction completing, the CPU could have executed thousands of instructions (page faults). And if the data spans cache lines, is the read still atomic? You'd have to see what the CPU specs are to be really sure that all the bits are transferred at the same time. — Skizz, Jul 14 '14 at 15:47
possible duplicate of [What operations are atomic in C#?](http://stackoverflow.com/questions/11745440/what-operations-are-atomic-in-c) — user2284570, Jul 15 '14 at 02:40

score 23 · Answer 1 · edited Jul 15 '14 at 00:45

23

My colleague and I are having an argument on atomicity of reading a double on an Intel architecture using C# .NET 4.0.

Intel guarantees that 8 byte doubles are atomic on read and write when aligned to an 8 byte boundary.

C# does not guarantee that a double is aligned to an 8 byte boundary.

He is arguing that we should use Interlocked.Exchange method for writing into a double, but just reading the double value (in some other thread) is guaranteed to be atomic.

Your colleague is not thinking this through carefully. Interlocked operations are only atomic with respect to other interlocked operations. It doesn't make any sense to use interlocked operations some of the time; this is like saying that traffic that is going through the intersection to the north doesn't have to obey the traffic light because traffic that is going through the intersection to the west does obey the traffic light. Everyone has to obey the lights in order to avoid collisions; you can't do just half.

My argument is that .NET doesn't guarantee this atomicity, but his argument is that on an Intel architecture this is guaranteed (maybe not on AMD, SPARC, etc.).

Look, suppose that argument were correct, which it isn't. Is the conclusion that we're supposed to reach here is that the several nanoseconds that is saved by doing it wrong are somehow worth the risk? Forget about interlocked. Take a full lock every time. The only time you should not take a full lock when sharing memory across threads is when you have a demonstrated performance problem that is actually due to the twelve nanosecond overhead of the lock. That is when a twelve nanosecond penalty is the slowest thing in your program and that is still unacceptable, that's the day you should consider using a low-lock solution. Is the slowest thing in your program taking a 12 nanosecond uncontended lock? No? Then stop having this argument, and spend your valuable time making the parts of your program that take more than 12 nanoseconds faster.

Reader is OK to read a stale (previous) value, but not incorrect value (partial read before and after write giving a garbage value).

Don't conflate atomicity with volatility.

The interlocked operations, and the lock statement, will both make a memory barrier that ensures that the up-to-date value is read or published. An ordinary non-volatile read or write is not required to do so; if it happens to do so, you got lucky.

If these sorts of issues interest you, a related issue that I am occasionally asked about is under what circumstances a lock around an integer access can be elided. My articles on that subject are:

edited Jul 15 '14 at 00:45

Peter Mortensen

30,738
21
105
131

answered Jul 14 '14 at 15:46

Eric Lippert

647,829
179
1,238
2,067

Could you, please, explain the risks of using interlocking operations vs full locking, as you mentioned in your answer? – Luca Cremonesi Jul 14 '14 at 17:57
2

@LucaCremonesi: I have seen this code *many many times*: `int DecreaseCounter() { InterlockedDecrement(ref counter); if (counter == 0) Cleanup(); return counter; }`. No one ever decrements again once the counter is zero, and we wish `Cleanup` to run *exactly* once. Do you see the defect? – Eric Lippert Jul 14 '14 at 18:38
4

The overhead of locking extends beyond the 12ns required to acquire and release the lock. A method which uses `Interlocked.Increment` can reasonably guarantee that it will return within a certain length of time. Code which uses `lock` cannot: it might get stuck waiting forever if something else acquires the lock and then gets waylaid. Having a subsystem use `Interlocked` rather than locks, when practical, may allow it to be invoked safely in contexts like `Finalize` methods where code isn't supposed to block for arbitrary periods of time. – supercat Jul 14 '14 at 18:45
7

@supercat: You make a good point regarding contention, which is why I was careful to call out that the penalty is for an uncontended lock. What about contention? **If your performance problem is due to too much contention then that's the problem to fix at an architectural level**, not by going to a low-lock solution. If you have traffic jams then *design a more efficient road network*, don't start running red lights and hope for the best. – Eric Lippert Jul 14 '14 at 18:58
@supercat: Your point about finalizers is correct, but of course if every time I wrote an answer I had to say "and don't forget, finalizers are a weird context to run code in", then that would get pretty repetitive. :-) Most people don't write finalizers, and if the question is about code in a finalizer then it should say so. – Eric Lippert Jul 14 '14 at 18:59
1

@EricLippert The problem I see here is that you are using directly the variable passed to the interlocked operation in the `if` condition instead of its return value. In this case another thread could decrement the value again before the condition is checked. What if the code is the following? `int DecreaseCounter() { var newVal = Interlocked.Decrement(ref counter); if (newVal == 0) Cleanup(); return newVal; }` – Luca Cremonesi Jul 14 '14 at 19:06
2

@LucaCremonesi: Correct. The code that I gave has two flaws. First, what if counter is two, two threads enter, both decrement, now counter is zero, both threads call `Cleanup`, which is perhaps not threadsafe since it assumes that it will be called only once. Second, and less dangerous, is that two calls on two different threads can return the same value, which seems unexpected. Your fix is correct, and is the standard pattern. However, my point is that if you always consistently use locks every time you share memory, then you don't have to worry about this stuff. – Eric Lippert Jul 14 '14 at 19:11
@LucaCremonesi: Since it seems these issues interest you, I've added a couple of links to articles I've written on related topics recently. – Eric Lippert Jul 14 '14 at 19:13
1

@EricLippert: Of course finalizers are weird; my point is that lock-free methods are *qualitatively* different from locking ones. Among other things, knowing that method which acquires a lock calls only lock-free methods which are guaranteed to terminate, in a loop structure which is guaranteed to terminate, is sufficient to know that the method cannot cause deadlock. Calling a non-lock-free method while holding a lock may be safe, but its safety may depend upon the behavior of many other methods which might not be available for inspection. – supercat Jul 14 '14 at 19:35
1

@EricLippert: Another problem with locks is that they only work if all accesses to a guarded resource are protected with the same lock. By contrast, all methods using an interlocked variable will use the same hardware interlocks to assure atomicity. – supercat Jul 14 '14 at 23:13
1

@supercat: Oh indeed, don't mistake me here. As I point out frequently, locks are themselves horrid and introduce a whole new set of problems, no doubt about it. My advice is generally *don't share memory across threads in the first place*. It's just too hard to get it right. But if you're going to, then I say to start with the simplest possible locks and only go to low-lock solutions when driven by need. – Eric Lippert Jul 14 '14 at 23:15
@EricLippert appreciate this answer. Very descriptive and easy to understand. Couple of comments: 1. previous value is not in the context of volatility but before updating the value by another thread. Assume the writer is updating the value and reader is trying to read at the same time, is it ever possible to read a garbage value because read is not atomic? Garbage is because 1 cycle reads previous value where as the next cycle reads a new value giving a completely new value (garbage). If not then it has to be either new value or old value before update finishes. – Alok Jul 15 '14 at 00:17
We are changing our architecture from 1 big lock at application level to 1 lock per object (I personally prefer 1 lock). Having smaller locks should improve the performance but in my experience it adds to the context switch overhead degrading the performance. there is a sweet spot on where you use big lock vs smaller locks which completely depends on your application. – Alok Jul 15 '14 at 00:21
1

You know you can force it to align with the correct directives on a struct, right (LayoutKind)? – Joshua Jul 15 '14 at 03:40
yup, I will explore that. – Alok Jul 15 '14 at 08:38
@Alok The problem with small locks arises when you need to hold several of them at the same time. You need to ensure a lock order to avoid deadlocks. – CodesInChaos Sep 10 '14 at 20:17

score 20 · Accepted Answer · edited May 23 '17 at 10:30

Yes and no. On 32-bit processors, it is not guaranteed atomic, because double is larger than the native wordsize. On a 64-bit processor properly aligned access is atomic. The 64-bit CLI guarantees everything up to a 64-bit read as atomic. So you'd need to build your assembly for x64 (not Any CPU). Otherwise if your assembly may be run on 32-bit, you better use Interlocked, see Atomicity, volatility and immutability are different, part two by Eric Lippert. I think you can rely on Eric Lippert's knowledge of the Microsoft CLR.

The ECMA CLI standard also supports this, even though C# itself does not guarantee it:

A conforming CLI shall guarantee that read and write access to properly aligned memory locations no larger than the native word size (the size of type native int) is atomic (see §I.12.6.2)

It also says that on processors where operations are atomic, Interlocked methods are often compiled to a single instruction, so in my book there is no performance penalty in using it. On the other hand, there may be a worse penalty to not using it when you should.

Another related Stack Overflow question is What operations are atomic in C#?.

It isn't guaranteed to be atomic on a 64-bit processor, misalignment is still possible. Easiest to do with BitConverter. — Hans Passant, Jul 15 '14 at 00:25
@HansPassant - Yes, my answer includes "properly aligned" in the quote from the CLI standard, but I will add it to the first sentence as well. Thanks! — codenheim, Jul 15 '14 at 00:37

score 18 · Answer 3 · edited May 23 '17 at 12:26

Reading or writing a double is atomic on Intel architecture if they are aligned on an 8-byte address boundary. See Is Updating double operation atomic.

Even though reads and writes of doubles might effectively be atomic in .NET code on Intel architecture, I wouldn't trust it as the C# spec doesn't guarantee it, see this quote from an Answer by Eric Lippert.

Reads and writes of the following data types are atomic: bool, char, byte, sbyte, short, ushort, uint, int, float, and reference types. In addition, reads and writes of enum types with an underlying type in the previous list are also atomic. Reads and writes of other types, including long, ulong, double, and decimal, as well as user-defined types, are not guaranteed to be atomic.

Use Interlocked for reading and writing to be safe. It guarantees atomicity. On an architecture where it is atomic by default, it shouldn't produce any overhead. You need to use Interlocked for reading as well writing to ensure that no partially written values are read (quote from InterLocked.Read() documentation):

The Read method and the 64-bit overloads of the Increment, Decrement, and Add methods are truly atomic only on systems where a System.IntPtr is 64 bits long. On other systems, these methods are atomic with respect to each other, but not with respect to other means of accessing the data. Thus, to be thread safe on 32-bit systems, any access to a 64-bit value must be made through the members of the Interlocked class.

+1, but `Interlocked` most certainly does introduce (a little) overhead because it uses a CAS loop instead of simply reading/writing the value in order to achieve stronger memory ordering guarantees (which programs now accidentally rely on making this behaviour impossible to change because of backwards-compatibility). — Cameron, Jul 14 '14 at 16:19

Is double read atomic on an Intel architecture?

3 Answers3

Linked