7

Reading a draft of C++11 I was interested by clause 1.7.3:

A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having non-zero width. ... Two threads of execution (1.10) can update and access separate memory locations without interfering with each other.

Does this clause protect from hardware related race conditions such as:

  • unaligned data access where memory is updated in two bus transactions (memory tearing)?
  • where you have distinct objects within a system memory unit, e.g. two 16-bit signed integers in a 32-bit word, and each independent update of the separate objects requires the entire memory unit to be written (memory conflict)?
TheJuice
  • 4,434
  • 2
  • 26
  • 36
  • possible duplicate of [Memory model ordering and visibility?](http://stackoverflow.com/questions/7461484/memory-model-ordering-and-visibility) – Hans Passant Jun 05 '12 at 12:18
  • @HansPassant: from my quick read, that question seems more related to visibility of a shared object between threads, i'm asking here about memory conflicts caused by updating distinct objects. – TheJuice Jun 05 '12 at 12:26

2 Answers2

5

Regarding the second point, the standard guarantees that there will be no race there. That being said, I have been told that this guarantee is not implemented in current compilers, and it might even be impossible to implement in some architectures.

Regarding the first point, if the second point is guaranteed, and if your program does not contain any race condition, then the natural outcome is that this will not be a race condition either. That is, given the premise that the standard guarantees that writes to different sub word locations are safe, then the only case where you can have a race condition is if multiple threads access the same variable (that is split across words, or more probably for this to be problematic, across cache lines).

Again this might be hard or even impossible to implement. If your unaligned datum goes across a cache line, then it would be almost impossible to guarantee the correctness of the code without imposing a huge cost to performance. You should try to avoid unaligned variables as much as possible for this and other reasons (including raw performance, a write to an object that touches two cache lines involves writing as many as 32 bytes to memory, and if any other thread is touching any of the cache lines, it also involves the cost of synchronization of the caches...

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
2

It does not protect against memory tearing, which is only visible when two threads access the same memory location (but the clause only applies to separate memory locations).

It appears to protect against memory conflict, according to your example. The most likely way to achieve this is that a system which can't write less than 32 bits at once would have 32-bit char, and then two separate objects could never share a "system memory unit". (The only way two 16-bit integers can be adjacent on a system with 32-bit char is as bitfields.)

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • The issue is a bit more complex for multithreading, not just 8/32 bytes. Actual writes out of L1 cache are done with a cache line size granularity. Current compilers cannot usually write less than X (16 bytes is a common number here for intel) bytes in a block. On the other hand, processors use other techniques to guarantee (or try to) that this is not an issue, for example marking the cache line as dirty and forcing other processors to reload before writing to it or more complex synchronization algorithms. – David Rodríguez - dribeas Jun 05 '12 at 12:43
  • 2
    @dribeas: At the ISA level, smaller regions can be updated. That the processor has to take a bus lock, or use cache coherency algorithms to achieve this is an implementation detail. The important thing is that the ISA provides primitives for atomic update of smaller regions without the cost of a software mutex. – Ben Voigt Jun 05 '12 at 12:44
  • Yes... and no. The problem of unaligned access does not come from the size of the write from the processor to the cache, but the fact that the write might actually cross a cache boundary. While writing to a single cache line is handled by the cache coherency algorithm, I am not sure those algorithms can handle simulating atomicity of the write to different cache lines (i.e. hardware would have to lock both cache lines, write and release; or mark both lines as dirty at onces or... whatever the algorithm does but on *both* lines at once) – David Rodríguez - dribeas Jun 05 '12 at 14:08
  • 1
    @dribeas: Already agreed that *tearing* is not prevented. What is provided is that an update will not change bytes that aren't part of the object (undoing a change made by another thread). Two non-transactional but individually atomic updates to two cache lines provide that. – Ben Voigt Jun 05 '12 at 14:14