C++ memory model and race conditions on char arrays

Question

Basically I have trouble understanding this: (from Bjarne FAQ)

However, most modern processors cannot read or write a single character, it must read or write a whole word, so the assignment to c really is ``read the word containing c, replace the c part, and write the word back again.'' Since the assignment to b is similar, there are plenty of opportunities for the two threads to clobber each other even though the threads do not (according to their source text) share data!

So how can char arrays exist without 3(7?) byte padding between elements?

Another question about this paragraph, about the claim it makes about "modern hardware": [Can modern x86 hardware not store a single byte to memory?](https://stackoverflow.com/questions/46721075/can-modern-x86-hardware-not-store-a-single-byte-to-memory). (TL:DR: whatever HW does internally, all ISAs with a byte-store instruction don't have any architecturally-visible effects on the surrounding bytes, so there's no software correctness issue. Early Alpha AXP is the lone "modern" ISA without byte load/store instructions, which is a problem for the C++11 memory model.) — Peter Cordes, Oct 19 '17 at 02:23

score 9 · Answer 1 · answered Nov 11 '13 at 10:32

I think Bjarne is wrong about this, or at least, he's simplifying things considerably. Most modern processors are capable of writing a byte without reading a complete word first, or rather, they behave "as if" this were the case. In particular, if you have a char array[2];, and thread one only accesses array[0] and thread two only accesses array[1] (including when both threads are mutating the value), then you do not need any additional synchronization; this is guaranteed by the standard. If the hardware does not allow this directly, the compiler will have to add the synchronization itself.

It's very important to note the "as if", above. Modern hardware does access main memory by cache lines, not bytes. But it also has provisions for modifying single bytes in a cache line, so that when writing back, the processor core will not modify bytes that have not been modified in its cache.

score 7 · Answer 2 · answered Nov 11 '13 at 09:54

7

A platform that supports C++11 must be able to access storage of the size of one char without inventing writes. x86 does indeed have that ability. If a processor must modify 32 bits at once at any time, it must have a 32-bit wide char.

(Some background reasoning: arrays are stored contiguously, and chars have no padding (3.9.1).)

answered Nov 11 '13 at 09:54

Kerrek SB

464,522
92
875
1,084

@NoSenseEtAl: As long as he's able to think of enough other platforms... which a character of his description most certainly is :-) – Kerrek SB Nov 11 '13 at 09:57
@NoSenseEtAl: For what it's worth, Herb Sutter makes this point quite clearly in the [Atomic Weapons](http://herbsutter.com/2013/02/11/atomic-weapons-the-c-memory-model-and-modern-hardware/) talks. – Kerrek SB Nov 11 '13 at 09:58
@NoSenseEtAl: Also, I think the point is that the *naive* implementation of a read-modify-write invents spurious writes. But that's not to say that the architecture doesn't *also* support a more expensive, correct operation. In single-threaded mode, you would have no desire to pay such a price. – Kerrek SB Nov 11 '13 at 10:00
regarding single threaded mode... afaik compiler cant know if it is single threaded or not, Hans explicitly mentioned that they lost a bit of performance by disallowing certain stuff like speculative writes – NoSenseEtAl Nov 11 '13 at 10:01
2

@NoSenseEtAl There is a difference between what the Ram-bus does and the logical working of the CPU. And Stroustrup knows that (and expects this knowledge from his readers). x86-Assembler can easily access bytes. But the Ram-Interface does reads and writes whole 32 (or 64) Bit Words. – Henno Nov 11 '13 at 10:40

C++ memory model and race conditions on char arrays

2 Answers2

Linked