Atomicity in C++ : Myth or Reality

Question

I have been reading an article about Lockless Programming in MSDN. It says :

On all modern processors, you can assume that reads and writes of naturally aligned native types are atomic. As long as the memory bus is at least as wide as the type being read or written, the CPU reads and writes these types in a single bus transaction, making it impossible for other threads to see them in a half-completed state.

And it gives some examples:

// This write is not atomic because it is not natively aligned.
DWORD* pData = (DWORD*)(pChar + 1);
*pData = 0;

// This is not atomic because it is three separate operations.
++g_globalCounter;

// This write is atomic.
g_alignedGlobal = 0;

// This read is atomic.
DWORD local = g_alignedGlobal;

I read lots of answers and comments saying, nothing is guaranteed to be atomic in C++ and it is not even mentioned in standarts, in SO and now I am a bit confused. Am I misinterpreting the article? Or does the article writer talk about things that are non-standart and specific to MSVC++ compiler?

So according to the article the below assignments must be atomic, right?

struct Data
{
    char ID;
    char pad1[3];
    short Number;
    char pad2[2];
    char Name[5];
    char pad3[3];
    int Number2;
    double Value;
} DataVal;

DataVal.ID = 0;
DataVal.Number = 1000;
DataVal.Number2 = 0xFFFFFF;
DataVal.Value = 1.2;

If it is true, does replacing Name[5] and pad3[3] with std::string Name; make any difference in memory-alignment ? Will the assignments to Number2 and Value variables be still atomic?

Can someone please explain?

std::string is not a native type, so std::string access will not be atomic. — Coder, Feb 15 '11 at 09:56
@Coder: That is not what I meant. Sorry I will change the question. — ali_bahoo, Feb 15 '11 at 09:57
the individual assignments may well be atomic, however as a whole the sequence of assignments will not be. in this scenario, it's safest to lock. — Nim, Feb 15 '11 at 10:00
It is not universally true, it is only sometimes true for x86. Read is atomic, and write is. But updating value (incrementing or so) is not. — osgx, Feb 15 '11 at 10:05
@sad_man, a single read and a single write are to be atomic however you can't use the already read value (test it, increment it, whatever) and write it back. you need some form of CAS (compare and swap/set) or conditional store, so both a read and write can be executed atomically. — bestsss, Feb 15 '11 at 10:18
Note that atomicity of an operation does *not* imply other threads will be able to actually *see the changes* (immediately or *at all*). So you still need synchronization. — fredoverflow, Feb 15 '11 at 11:25
@FredOverflow: Why will not the other threads be able to see the changes? I do not understand. I can post a question about this if you like. — ali_bahoo, Feb 15 '11 at 11:44
Because modern processors are extremely complicated. Atomicity and synchronization are different (but related) issues. — fredoverflow, Feb 15 '11 at 11:49
Atomicity is mainly interesting for lock-free algorithms that also work with outdated values. In that case, atomicity is important to ensure that other threads don't see *illegal* values. — fredoverflow, Feb 15 '11 at 11:51
Oh, there you go, I just realized the article is specifically about lock-free algorithms, which is one of the hardest areas of computer science. Unless you're a genius, I suggest to stay away from it. Lock-free algorithms are basically impossible to test and debug, their correctness must be formally proven. — fredoverflow, Feb 15 '11 at 11:58
@FredOverflow: I was not spesifically studying lock-freeness. I saw the link in a question at SO and thought it could be interesting. I need quite a while to go for lock-free things. I know that. — ali_bahoo, Feb 15 '11 at 12:03
I think you still underestimate the complexity :) Unless your name is Simon Peyton Jones, forget about implementing your own lock-free algorithms. — fredoverflow, Feb 15 '11 at 12:08
@FredOverflow: Well there is nothing bad about dreaming. You have to aim the stars, to reach enough height. — ali_bahoo, Feb 15 '11 at 12:21
That quotation from the article already contradicts itself. There should not be a full stop between "are atomic" and "as long as", because naturally aligned native types don't necessarily fit the bus width. For example `double` and a 32 bit bus typically doesn't. Unless they're also saying that a 32 bit bus is inherently "not modern", but isn't Windows 7 heading ARM-wards? The phrase "all modern processors", wherever it's used, should set off alarms telling you "these are weasel-words". What they mean is "for implementations where it happens to be true, the following is true:". — Steve Jessop, Feb 15 '11 at 13:20
... which isn't necessarily a bad thing to say, provided it's true for all architectures that MSVC targets, and they're talking primarily about lock-free algorithms for MSVC, and MSVC always uses the atomic CPU instructions where possible. It's just a bit dodgy to try to generalize by hand-waving about an unspecified collection of compilers/hardware, because some reader might have different ideas than the author about exactly what conditions they're talking about. — Steve Jessop, Feb 15 '11 at 13:28
@FredOverflow: to be fair to the article, it agrees with you. Near the end, "only use standard lockless programming algorithms that have been proven to be correct". I think it's intended as background for what principles allow standard algorithms to be proven correct, and how to use them correctly, not as a primer for those attempting the genius-level feat of inventing entirely novel lockless operations. Or if not genius-level, then at least demanding of academic peer-review. — Steve Jessop, Feb 15 '11 at 13:32
@Steve: Fair enough, I didn't read the article. Lock-free algorithms are just way over my head. — fredoverflow, Feb 15 '11 at 13:35
@FredOverflow: Herb Sutter wrote quite a good series about lock-free queues in C++0x, using the new atomic types. Gets a little hairy at times, but it's basically comprehensible to mere mortals. Since it's "just" demonstrating implementations of algorithms known to be good, and used by the committee specifically as motivation for the guaranteed behaviour of the atomic types, it's not too hard going. 3 of the articles listed here http://herbsutter.com/2008/11/02/out-of-order-effective-concurrency-writing-lock-free-code-a-corrected-queue/ — Steve Jessop, Feb 15 '11 at 13:39

score 30 · Accepted Answer · edited Jun 20 '20 at 09:12

This recommendation is architecture-specific. It is true for x86 & x86_64 (in a low-level programming). You should also check that compiler don't reorder your code. You can use "compiler memory barrier" for that.

Low-level atomic read and writes for x86 is described in Intel Reference manuals "The Intel® 64 and IA-32 Architectures Software Developer’s Manual" Volume 3A ( http://www.intel.com/Assets/PDF/manual/253668.pdf) , section 8.1.1

8.1.1 Guaranteed Atomic Operations

The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will always be carried out atomically:

Reading or writing a byte
Reading or writing a word aligned on a 16-bit boundary
Reading or writing a doubleword aligned on a 32-bit boundary

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

Reading or writing a quadword aligned on a 64-bit boundary
16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The P6 family processors (and newer processors since) guarantee that the following additional memory operation will always be carried out atomically:

Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

This document also have more description of atomically for newer processors like Core2. Not all unaligned operations will be atomic.

Other intel manual recommends this white paper:

http://software.intel.com/en-us/articles/developing-multithreaded-applications-a-platform-consistent-approach/

score 12 · Answer 2 · answered Feb 15 '11 at 10:47

I think you are misinterpreting the quote.

Atomicity can be guaranteed on a given architecture, using specific instructions (proper to this architecture). The MSDN article explains that read and writes on C++ built-in types can be expected to be atomic on x86 architecture.

However the C++ standard does not presume what the architecture is, therefore the Standard cannot make such guarantees. Indeed C++ is used in embedded software where the hardware support is much more limited.

C++0x defines the std::atomic template class, which allows to turn reads and writes into atomic operations, whatever the type. The compiler will select the best way to obtain atomicity based on the type characteristics and the architecture targeted in a standard compliant manner.

The new standard also defines a whole lot of operations similar to MSVC InterlockExchange that is also compiled to the fastest (yet safe) available primitives offered by the hardware.

James · Answer 3 · 2011-02-15T14:11:14.453

3

The c++ standard does not guarantee atomic behaviour. In practice however simple load and store operations will be atomic, as the article states.

If you need atomicity, better to be explicit about it and use some sort of lock though.

*counter = 0; // this is atomic on most platforms
*counter++;   // this is NOT atomic on most platforms

edited Feb 15 '11 at 14:11

answered Feb 15 '11 at 09:55

James

24,676
13
84
130

"simple operations on integers will be atomic". You must mention that only read and constant write are simple, but not an updating. – osgx Feb 15 '11 at 10:04
@osgx: What do you mean by updating? Is this an updating? `DataVal.Number2 = someother_int;`. Is not this atomic? – ali_bahoo Feb 15 '11 at 10:13
@osgx: I thought first the value of `someother_int` is read then it is written to `Number2`. – ali_bahoo Feb 15 '11 at 10:18
1

`number2 = some_int` have several operations. Reading of some_int is atomic, write to number2 is atomic; but they are not atomic as whole. – osgx Feb 15 '11 at 10:22
@osgx: I see. Then InterlockedExchange may be used to make the operation atomic as whole. – ali_bahoo Feb 15 '11 at 10:27

score 2 · Answer 4 · answered Feb 15 '11 at 10:00

Be very careful when relying on the atomicity of simple word size operations because things might behave differently from what you expect. On multicore architectures, you might witness out of order reads and writes. This will then require memory barriers to prevent. (more details here).

Bottom line for an application developer is either use primitives that the OS guarantees will be atomic or use appropriate locks.

score 1 · Answer 5 · answered Feb 15 '11 at 09:57

1

IMO, the article incorporates some assumptions about the underlying architecture. As C++ has only some minimalistic requirements on the architecture, no guarantees for example about atomicity can be given in the standard. For example a byte has to be at least 8 bits, but you could have an architecture where a byte is 9 bits, but an int 16... theoretically.

So when the compiler is specific for x86 architecutre, the specific features can be used.

NB: structs are usually aligned by default to a native word boundary. you can disable that by #pragma statements, so your padding fills are not required

answered Feb 15 '11 at 09:57

king_nak

11,313
33
58

I have 2 questions if you do not mind. 1. Are the **classes** aligned by MSVC++ by default? 2. You mention aligning to a native word boundary. Is this the same case with x64 environments? – ali_bahoo Feb 15 '11 at 10:25
ad 1: classes are also aligned (basically any compound data type). ad 2: see http://msdn.microsoft.com/en-us/library/2e70t5y1(v=vs.80).aspx. The alignment will be on 8 byte boundaries on x64 (if not changed by #pragam pack), or to a multiple of the data type's size – king_nak Feb 15 '11 at 11:43

Aaron Gage · Answer 6 · 2011-02-15T10:03:55.520

I think what they are trying to get accross, is that data types implemented natively by the hardware, are updated within the hardware such that reading from another thread will never give you a 'partially' updated value.

Consider a 32 bit integer on a 32+ bit machine. It is written or read completely in 1 instruction cycle, whereas data types of larger sizes, say a 64 bit int on a 32 bit machine will require more cycles, hence theoretically the thread writing them could be interrupted in between those cycles ergo the value is not in a valid state.

No useing string would not make it atomic, as string is a higher level construct and not implemented in the hardware. Edit: As per your comment on what you (didnt) mean about changing to string, it should not make any difference to fields declared after, as mentioned in another answer the compiler will align fields by default.

The reason it is not in the standard is that, as stated in the article this is about how modern processors implement the instructions. Your standard C/C++ code should work exactly the same on a 16 or 64 bit machine (just with performance difference), however if you assume you will only execute on a 64 bit machine, then anything 64bits or smaller is atomic. (SSE etc type aside)

score 1 · Answer 7 · answered Feb 15 '11 at 11:48

1

I think atomicity as it is referred in the article has little practical usage. This means that you'll read/write valid value but probably outdated. So reading an int, you'll read it completely, not 2 bytes from an old value and other 2 bytes from a new value currently being written by another thread.

What is important for shared memory is memory barriers. And they are guarantied by synchronization primitives such as C++0x atomic types, mutexes etc.

answered Feb 15 '11 at 11:48

Andriy Tylychko

15,967
6
64
112

It may be of little usage for the intentions of the OP but that doesn't mean that there aren't any good valid use cases that rely on that behaviour. Updating a value that way is perfectly fine for values calculated from a logically immutable type, and thus requires neither atomic, nor mutex, merely volatile. Examples would be deferred calculation of a hashcode or c strlen. – Aiueiia Apr 30 '21 at 08:33

score 0 · Answer 8 · answered Feb 15 '11 at 10:02

0

I do not think changing char Name[5] to std::string Name will make a difference if you are using it only for individual character assignments, since the index operator will return a direct reference to the underlying character. A full string assignment is not atomic (and you can't do it with a char array, so I'm guessing you weren't thinking of using it this way anyways).

answered Feb 15 '11 at 10:02

Ken Wayne VanderLinde

18,915
3
47
72

I edited the question. I think it is clearer now. I do not want to do atomic string assignments. I wonder if it changes the memory alignment. – ali_bahoo Feb 15 '11 at 10:06

Atomicity in C++ : Myth or Reality

8 Answers8

Linked