Memory pool atomic C11 variables cause segfaults

Question

I have a problem with using atomic variables in memory pool. See the short code below, it causes a segfault. Curiously enough it happens only for bigger integer types, for atomic_int and atomic_uint all works well!

I am using stock clang on an M1 ARM64 OS X, can anyone reproduce this under other OS/compilers? Is it clang specific?

#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>
#include <stdlib.h>
#include <assert.h>
 
#define MAX(X, Y) (((X) > (Y)) ? (X) : (Y))

typedef struct AtomTest {
  atomic_ulong atomic_count;    // atomic variable
  int64_t non_atomic_count;    // non-atomic variable
} AtomTest;

size_t chunkSize = 0; 
 
int main(void)
{
    chunkSize = MAX(sizeof(AtomTest), 300);
    uint8_t *memory = malloc(100000 * chunkSize); 
    assert(memory);
    for(int i=0; i<100000; i++) {
      AtomTest *aa = (AtomTest *)(memory + chunkSize * i);
      aa->atomic_count = 0;
      /* tried this instead, same result:  */
      /* atomic_store_explicit (&(aa->atomic_count), 0, memory_order_relaxed);  */
      /* tried this instead, same result: */
       /* atomic_init (&(aa->atomic_count), 0); */
      aa->non_atomic_count = 0;
    }
 
    for(int i=0; i<100000; i++) {
      AtomTest *aa = (AtomTest *)(memory + chunkSize * i);
      printf("The atomic counter is: %lu\n", aa->atomic_count);
      printf("The non-atomic counter is: %llu\n", aa->non_atomic_count);
    }
}

Surprisingly, in LLDB I can see the segfault but somehow I can access the variable from lldb directly:

Process 26353 launched: '/Users/mareklipert/Documents/prog/clojure/clojure-rt/llvm/runtime/atomic' (arm64)
Process 26353 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=257, address=0x10180012c)
    frame #0: 0x0000000100003e88 atomic`main at atomic.c:23:24
   20       assert(memory);
   21       for(int i=0; i<100000; i++) {
   22         AtomTest *aa = (AtomTest *)(memory + chunkSize * i);
-> 23         aa->atomic_count = 0;
   24         /* tried this instead, same result:  */
   25         /* atomic_store_explicit (&(aa->atomic_count), 0, memory_order_relaxed);  */
   26         /* tried this instead, same result: */
Target 0: (atomic) stopped.
(lldb) po &(aa->atomic_count)
0x000000010180012c

(lldb) po aa->atomic_count
0

Tip: A more readable version of that is `&memory[chunksize * i]`. Why is the `memory` allocation not typed accordingly? Why not `calloc(100000, sizeof(AtomTest))`? — tadman, Oct 11 '22 at 23:37
I think you have alignment issues here. `300` seems like an extremely non-base-2 number. — tadman, Oct 11 '22 at 23:38
What you might want is to have a zero-length array of `uint8_t` at the end of that structure that serves as the arbitrary memory region you can write to, the bounds of which are dictated by the original allocation. If you want to pre-allocate a bunch of these with a particular buffer size, be sure the main records align to their required offsets, as in meet the [`alignof(AtomTest)`](https://en.cppreference.com/w/c/language/_Alignof) needs. — tadman, Oct 11 '22 at 23:44
@tadman: `&memory[chunksize * i]` would access a byte address that was `memory + sizeof(AtomTest)*chunksize * i`. Unlike now where it's `memory + chunksize*i`, even though `chunksize` isn't a multiple of `AtomTest` or of `alignof(atomic_ulong) = 8`, as you said. I thought AArch64 allowed unaligned loads/stores, but maybe it doesn't for stores like `stlr` that are intended for atomic usage (that one having release semantics, and seq_cst wrt. `ldar`). But OP says even `atomic_store_explicit` with `mo_relaxed` failed. But maybe with optimization disabled, it still ends up using seq_cst `stlr`? — Peter Cordes, Oct 12 '22 at 05:50
Anyway, what is the point of this rounding up the chunk size to 300 and doing misaligned access to later chunks? Why not just `malloc(N * sizeof(struct))`? — Peter Cordes, Oct 12 '22 at 05:53
@PeterCordes The aim is to hasten the allocation process (memory pool) but you are right - the cause of the problem is that 300 % 8 != 0 — Terminus, Oct 12 '22 at 06:19
I don't see why allocating 300/16 times more memory than you need would make allocation faster. If you want to reserve space for more structs, use a bigger multiplier of its size. — Peter Cordes, Oct 12 '22 at 07:01
[Does AArch64 support unaligned access?](https://stackoverflow.com/q/38535738) confirms that AArch64 atomic memory access instructions fault on misalignment. At least the *exclusive* ones to. I'm curious what asm clang generated, like if it noticed the alignment UB at compile time and generated an instruction to fault on purpose. On x86 this would have run, but some of the stores would have been non-atomic (when the span a cache-line boundary at least). https://godbolt.org/z/hfvK4r9EY shows mainline clang using `stlr` for that store with or without optimization, so presumably it faults. — Peter Cordes, Oct 12 '22 at 07:04
Hi @PeterCordes, could you write your last comment as an answer? (as I believe this answers the question in most detail). -- Ahh, never mind, I see you've married it as a duplicate, thats good. — Terminus, Oct 12 '22 at 08:50
@PeterCordes: I wondered also about why it still faults with `memory_order_relaxed`, which as you say should emit `str`, but it turns out there is a simple explanation: in the second loop which prints out the data, the load of `aa->atomic_count` is implicitly `seq_cst` and so emits `ldar`, which faults when misaligned. If you change this to `atomic_load_explicit(&(aa->atomic_count), memory_order_relaxed)` as well, then `ldr` is emitted here and it does not fault. — Nate Eldredge, Oct 12 '22 at 14:22

Memory pool atomic C11 variables cause segfaults

0 Answers0