6

Will data stored with atomic_store, and loaded with atomic_load always appear consistent?

Specifically: A C11 program accesses 64-bit data placed deliberately on the boundary between cache lines on a modern Intel CPU. It uses atomic_store & atomic_load (from <stdatomic.h>) to access this data from multiple threads (running on different cores).

Will the data always appear consistent, or will loading it (atomic_load) sometimes have some bytes belonging to an old value, and other bytes belonging to a newer value?

Here are the essential struct and variable definitions and the interesting part of the program, happening in a loop, in parallel from multiple threads:

struct Data {
    uint8_t bytes[CACHELINE__BYTECOUNT - 4];
    atomic_uint_fast64_t u64;
} __attribute__((packed)) __attribute__((aligned ((CACHELINE__BYTECOUNT))));

#define VAL1 (0x1111111111111111)
#define VAL2 (0xFFFFFFFFFFFFFFFF)

static struct Data data = { .u64 = VAL1 };

...

    for (uint32_t j = 0; j < 1000; j++) {
        atomic_store(&data.u64, VAL1);
        atomic_store(&data.u64, VAL2);
    }
    const uint64_t val = atomic_load(&data.u64);
    /* is 'val' always VAL1 or VAL2? */

(Full runnable program: https://gist.github.com/sinelaw/1230d4675d6a4fff394110f17e463954)

Checking it with gcc 6.3.0 and clang 3.7 shows it isn't atomic:

$ clang -std=c11 -Wall -Wextra /tmp/atomic.c -o /tmp/atomic -lpthread
$ /tmp/atomic
ERROR: oh no, got: 11111111FFFFFFFF

So either there's a bug in the program, or I misunderstood <stdatomic.h>, or there's a bug in the compilers.

sinelaw
  • 16,205
  • 3
  • 49
  • 80
  • 1
    Well, you're asking it to UB, so UB you get. There is a reason why packing is not default. – Antti Haapala -- Слава Україні Aug 17 '17 at 08:37
  • 1
    `printf("%zu\n", _Alignof(atomic_uint_fast64_t));` -> 8 – Antti Haapala -- Слава Україні Aug 17 '17 at 08:40
  • I guess I would expect the compiler to either emit a warning, or do the right thing (and implement very slow atomic load/store for this case) – sinelaw Aug 17 '17 at 08:56
  • @sinelaw: The only way it could do that on x86 is with a `lock cmpxchg` to load, or `xchg` to store. (The `lock` prefix doesn't work with pure-load or pure-store `mov` instructions, and cache-line-split `lock`ed instructions are much slower than normal. (https://stackoverflow.com/a/36685056/224132)). Using `lock cmpxchg` to load will fault on read-only memory, and it's so slow that it doesn't give the programmer what they expect from "lock-free" atomics. gcc7 doesn't inline `lock cmpxchg16b` for 16-byte atomic objects anymore (https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02344.html). – Peter Cordes Aug 21 '17 at 17:07
  • Anyway, this is just a horrible idea. Use a narrower atomic type, and/or order your variables within the struct so it can pack "naturally". Or at worst split the array of structs into two or more arrays with different parts of the struct. (If any operations on your data can vectorize, a struct of arrays is usually better for those parts than an array of structs. But if you do access most of the struct members at once, then an array of structs is good for caching (spatial locality).) Just let the compiler pad your array, but choose your types so there's as little padding as possible. – Peter Cordes Aug 21 '17 at 17:14

1 Answers1

9

A correctly written program can not get an object that isn't correctly aligned. A correctly aligned int64 can't cross cache lines.

So the answer to your question is: there's a bug in your program. A bug deliberately introduced by you through using non-standard constructs (__attribute__) to break things.

It would be crazy for the compiler to go out of its way to ensure that stdatomic works for unaligned values because that would require a global lock which is what stdatomic is specifically there to avoid.

Art
  • 19,807
  • 1
  • 34
  • 60
  • Thanks. This means that using the `packed` attribute introduces undefined behavior? Isn't that a bad attribute to have? – sinelaw Aug 17 '17 at 08:48
  • Yes. There is no good reason ever to use `packed`.. after the definition the compiler wouldn't actually remember that they're at non-standard offsets. – Antti Haapala -- Слава Україні Aug 17 '17 at 08:49
  • I disagree. One reason to use `packed` is if you have a very big array of some struct and you want not waste memory on padding. – sinelaw Aug 17 '17 at 08:50
  • then redesign your structure so that there's no padding - organize members in decreasing rank etc. – Antti Haapala -- Слава Україні Aug 17 '17 at 08:52
  • Would be nice for the compiler to warn about fields becoming unaligned rather than emitting code with UB – sinelaw Aug 17 '17 at 08:54
  • @sinelaw why?! You specifically aimed the gun at your foot and pulled the trigger. The *only* effect that `packed` will ever have is causing your data becoming misaligned. If it didn't become misaligned, then there wouldn't be any effect. – Antti Haapala -- Слава Україні Aug 17 '17 at 08:59
  • 2
    @sinelaw That's probably the worst reason to use `packed`. There are certain edge cases where `packed` is useful, but that is not one of them. A warning for this would be completely redundant because that warning should be emitted pretty much every time you use `packed` because that's the only thing `packed` does - it breaks alignment. The only time it wouldn't be emitted is when `packed` had no effect. "Padding" is not something the compilers do to screw with you, it is what's left over after the data has been correctly aligned (which is why I dislike the word "padding"). – Art Aug 17 '17 at 09:00
  • @sinelaw also, ARM processors are known to *crash* / raise exceptions with unaligned access - even with processors that have *aligned* access. You just cannot ever know what your code does after packing your structures - what good use does that have? :D – Antti Haapala -- Слава Україні Aug 17 '17 at 09:01
  • @Art, how about a `packed`-like attribute, which means the compiler stores it unaligned in memory (to save space) but emits code that takes it into account and behaves in a well defined manner when accessing that memory? – sinelaw Aug 17 '17 at 10:08
  • @sinelaw Marginal utility since in 20 years in this business I've seen a situation where things must be packed this tightly exactly 0 times. And if you actually want atomic access to misaligned values you need locks. The locks involved would probably waste more space than you'd ever save on breaking alignment. – Art Aug 17 '17 at 10:18
  • @Art why is saving memory in a big array of some struct not a good reason for packing? What would you do? – sinelaw Aug 17 '17 at 14:10
  • @sinelaw You can save even more memory by deleting the array. I mean since you're ok with breaking things that will be even more efficient. – Art Aug 17 '17 at 14:22
  • @Art: Sometimes conformance with an external API will require use of unaligned data structures. Further, while it's true that most processors cannot perform fully-atomic operations on unaligned data, sometimes all that is necessary is that operations on certain objects be globally transitively ordered with regard to other operations on the same thread (e.g. populate an object, and then set an "isValid" flag), and there's no particular reason compilers shouldn't be able to uphold that level of guarantee on any kind of data with any alignment. – supercat Aug 28 '17 at 19:39