10

In Java, updating double and long variable may not be atomic, as double/long are being treated as two separate 32 bits variables.

http://java.sun.com/docs/books/jls/second_edition/html/memory.doc.html#28733

In C++, if I am using 32 bit Intel Processor + Microsoft Visual C++ compiler, is updating double (8 byte) operation atomic?

I cannot find much specification mention on this behavior.

When I say "atomic variable", here is what I mean :

Thread A trying to write 1 to variable x. Thread B trying to write 2 to variable x.

We shall get value 1 or 2 out from variable x, but not an undefined value.

Cheok Yan Cheng
  • 47,586
  • 132
  • 466
  • 875
  • Yes, 32-bit x86 (since original Pentium) has [efficient hardware support](https://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic) for lock-free `std::atomic` load, store, and CAS. Whether your compiler makes efficient code or not is another issue: https://stackoverflow.com/questions/45055402/atomic-double-floating-point-or-sse-avx-vector-load-store-on-x86-64. Aligned `double` will never have "tearing", but it's safer to use `std::atomic`. – Peter Cordes Sep 04 '17 at 06:05

5 Answers5

11

This is hardware specific and depends an the architecture. For x86 and x86_64 8 byte writes or reads are guaranteed to be atomic, if they are aligned. Quoting from the Intel Architecture Memory Ordering White Paper:

Intel 64 memory ordering guarantees that for each of the following memory-access instructions, the constituent memory operation appears to execute as a single memory access regardless of memory type:

  1. Instructions that read or write a single byte.

  2. Instructions that read or write a word (2 bytes) whose address is aligned on a 2 byte boundary.

  3. Instructions that read or write a doubleword (4 bytes) whose address is aligned on a 4 byte boundary.

  4. Instructions that read or write a quadword (8 bytes) whose address is aligned on an 8 byte boundary.

All locked instructions (the implicitly locked xchg instruction and other read-modify-write instructions with a lock prefix) are an indivisible and uninterruptible sequence of load(s) followed by store(s) regardless of memory type and alignment.

Gunther Piez
  • 29,760
  • 6
  • 71
  • 103
  • 3
    It also depends on the compiler, which is not required to ensure that doubles are 8-aligned in the first place, or to use a single quadword op to read or write them. Although you'd think it probably will, and also I expect that visual c++ documents whether it does or not. – Steve Jessop Aug 18 '09 at 10:53
  • 1
    Yes, this is specified in the compilers ABI. For non-auto variables doubles are always aligned, except if explicitly specified non aligned. For variables on the stack in a 32 bit system they may become unaligned if stack is somehow getting unaligned, for example, a funcitions is called from a extern, non-C program. But you don't want to return objects from stack in a function anyway... – Gunther Piez Aug 18 '09 at 12:07
  • You don't return automatics, but you might pass a pointer to them into a function you call. But I guess what you've said is sufficient for the questioner - as long as he controls how they were defined, he can ensure that access to his doubles is atomic. – Steve Jessop Aug 18 '09 at 13:58
  • Here's the rule for IA-32 (per the question): The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically: • Reading or writing a quadword aligned on a 64-bit boundary • 16-bit accesses to uncached memory locations that fit within a 32-bit data bus The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically: • Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line – Ben Voigt Nov 03 '10 at 01:57
2

It's safe to assume that updating a double is never atomic, even if it's size is the same as an int with atomic guarantee. The reason is that if has different processing path since it's a non-critical and expensive data type. For example even data barriers usually mention that they don't apply to floating point data/operations in general.

Visual C++ will allign primitive types (see article) and while that should guarantee that it's bits won't get garbled while writing to memory (8 byte allignment is always in one 64 or 128 bit cache line) the rest depends on how CPU handles non-atomic data in it's cache and whether reading/flushing a cache line is interruptable. So if you dig through Intel docs for the kind of core you are using and it gives you that guarantee then you are safe to go.

The reason why Java spec is so conservative is that it's supposed to run the same way on an old 386 and on Corei7. Which is of course delusional but a promise is a promise, therefore it promisses less :-)

The reason I'm saying that you have to look up CPU doc is that your CPU might be an old 386, or alike :-)) Don't forget that on a 32-bit CPU your 8-byte block takes 2 "rounds" to access so you are down to the mercy of the mechanics of the cache access.

Cache line flushing giving much higher data consistency guarantee applies only to a reasonably recent CPU with Intel-ian guarantee (automatic cache consistency).

ZXX
  • 4,684
  • 27
  • 35
  • Being a 32-bit x86 CPU doesn't mean all the internal data paths are only 32-bit. For example, Pentium 4 (even early 32-bit-only P4) does a `movaps` 16-byte aligned load in a single access to its L1D cache. You make a good point that atomic access to cache doesn't guarantee atomicity overall (e.g. AMD K10 has atomic 16B SSE load/stores within a single socket, but the coherency protocol introduces [tearing at 8B boundaries for threads on different sockets](https://stackoverflow.com/questions/7646018/sse-instructions-which-cpus-can-do-atomic-16b-memory-operations/7647825#7647825)) – Peter Cordes Sep 04 '17 at 05:46
  • Some of the earliest 386 systems might have still only had a 16-bit data bus (and no internal cache), so that would mean a `double` took 4 memory cycles. Anyway, since modern MSVC++ isn't going to make code that even runs on anything older than a Pentium (P5), you are guaranteed that aligned 8B loads/stores are atomic, even if done with x87 or SSE. (No idea why you say FP is less optimized. x86 has had high-performance floating point for years.) Once data reaches cache, it doesn't remember how it got there, so "non-atomic data in cache" is weird. – Peter Cordes Sep 04 '17 at 05:51
  • 1
    It's *safe* to assume that updating a `double` is never atomic, but overly conservative. – Peter Cordes Sep 04 '17 at 05:52
-2

I wouldn't think in any architecture, thread/context switching would interrupt updating a register halfway so that you are left with for example 18bits updated of the 32bits it was going to update. Same for updating a memory location ( provided that it's a basic access unit, 8,16,32,64 bits etc).

Indy9000
  • 8,651
  • 2
  • 32
  • 37
  • 1
    The problem is not context switching, it is multicore and multicpu. – AProgrammer Aug 18 '09 at 09:46
  • Even in multicore/multi cpu architecture physical memory access has to be serialised by the memory controller. It's electrically impossible to let multiple devices access the same circuitry at the same time duration. Memory controller accesses memory in blocks, and in whole units of data bus width, therefore it's not possible to have partial update of a memory location. – Indy9000 Aug 18 '09 at 10:32
  • 1
    So what about if the double lies across the boundary of two cache lines? I doubt that MSVC++ will do that, because everything will be aligned to its size in powers of 2. But if you're generalising, it's not a requirement of the C++ standard (and in at least one of the ARM ABIs, longs and doubles only have to be 4-aligned, not 8-aligned). – Steve Jessop Aug 18 '09 at 10:56
-2

So has this question been answered? I ran a simple test program changing a double:

#include <stdio.h>

int main(int argc, char** argv)
{
    double i = 3.14159265358979323;
    i += 84626.433;
}

I compiled it without optimizations (gcc -O0), and all assignment operations are performed with single assembler instructions such as fldl .LC0 and faddp %st, %st(1). (i += 84626.433 is of course done two operations, faddp and fstpl).

Can a thread really get interrupted inside a single instruction such as faddp?

  • 1
    "Can a thread really get interrupted inside a single instruction such as faddp?". No, thread cannot be "interrupted" inside a single instruction, by two CPUs can be performing their instructions at the same time, and one CPU can see only the part of the result of the second CPU if the instruction is not performed in a single bus transaction. – Suma Jun 30 '10 at 08:41
-2

On a multicore, besides being atomic, you have to worry about cache coherence, so that the thread reading sees the new value in its cache when the writer has updated.

excalibur
  • 924
  • 2
  • 11
  • 26
  • Atomicity and cache coherence are two different things. Atomicity is related that you either see the old value or the new one, never a transient state which could be necessary, cache coherence is related to the ordering in which you see the modification of several memory locations. – AProgrammer Jan 06 '12 at 14:03
  • x86 has coherent caches. You don't have to worry about it. As long as the reader actually reloads from memory, instead of reusing a value the compiler can keep in a register, it will see the update eventually. (This is why you should use `std::atomic` now that C++11 exists. Although current compilers make inefficient code for it: https://stackoverflow.com/questions/45055402/atomic-double-floating-point-or-sse-avx-vector-load-store-on-x86-64) – Peter Cordes Sep 04 '17 at 05:56
  • @AProgrammer: coherency means that two caches can't have different values for a cache line, so once a store has committed to L1D cache in one CPU, no other CPU can load a different value. https://en.wikipedia.org/wiki/MESI_protocol. Ordering between modifications to different cache lines is another level of functionality built on top of coherent caches. – Peter Cordes Sep 04 '17 at 06:01