Should I use a barrier while accessing statically initialized variable?

Question

In my function I have two following lines of code:

static volatile uint64_t static_index = 0;
const uint64_t index = __sync_fetch_and_add(&static_index, 1, __ATOMIC_RELAXED);

As you can see, static_index is shared between threads, while index is per-thread. My concern is that the static initialization may be reordered with using of this variable, but I'm not sure if this could be applied to statically (once) initialized variables.

Is __ATOMIC_RELAXED enough to avoid reordering in this case? Or maybe I should use __ATOMIC_RELEASE or even __ATOMIC_SEQ_CST here?

I appreciate any help, thank you.

Apart from the general problem, in this special case (global variable with 0) no explicit initialization happens at all; the bss is cleared (set to 0) on startup anyway, so this should be a non-issue here — Ctx, Jan 30 '20 at 12:11
Weird snippet, surely you are actually using __atomic_fetch_add()? No, you don't get a free pass because it is static. — Hans Passant, Jan 30 '20 at 12:17
Static initialization of all variables happens before your `main` function starts executing, it doesn't matter if the variable is static inside a function scope of some tiny function in the middle of nowhere. — vgru, Jan 30 '20 at 12:18

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

Your static initializer is a compile-time constant so you can (at least in practice) count on that static storage space already holding a 0 when your process starts.

(Specifically it will be in the BSS here. A non-zero constant would mean it goes in the .data section.)

I'm pretty sure it would also be safe for a non-constant initializer.

For a function-scoped static variable with a non-constant initializer, the first thread to enter the function runs the initializer. Compilers typically use a guard variable. The fast case (already initialized) involves an acquire load on that guard variable to check that the static var has already been initialized. Otherwise an atomic RMW makes sure that exactly 1 thread runs the initializer and the others wait for that.

But leaving aside implementation details: I didn't double-check what the standard says about static vars. But in the thread that does the init, the static volatile foo = x is clearly sequenced before the RMW on it, therefore is guaranteed to happen before.

In other threads, it becomes a question of whether they can reorder with static initialization. I think the answer to that must be no, otherwise you'd have data-race UB from reading or writing it without atomic builtins.

Within one thread, you can look at static foo = non_const; as making sure that foo is initialized. Even if we're not the thread that does the initializing.

memory_order_release or acquire wouldn't make sense as a way to make sure static initialization was complete before the atomic RMW, if some other thread was racing with us. That controls visibility order of our operations from the POV of other threads. I'm pretty sure the language rules just require that the RMW happens after everything that static foo = bar implies (whether that's doing the init or waiting for it if necessary) because of sequence ordering. Nothing else would make any sense if you consider the non-atomic case. You can't let other threads ever read an uninitialized variable.

(Note that C only supports non-constant initializers for function-scoped static vars. Only C++ supports that for globals.)

BTW, there's little reason to use deprecated/legacy GNU C __sync builtins: The manual says: They should not be used for new code which should use the ‘__atomic’ builtins instead.

The 3rd arg for __sync builtins isn't a memory-order, it's "an optional list of variables protected by the memory barrier” which GCC ignores. It's __atomic_fetch_add that takes a memory order parameter.

Or better for most cases, C11 <stdatomic.h> for _Atomic static uint64_t static_index = 0; and modify it with https://en.cppreference.com/w/c/atomic/atomic_fetch_add

atomic_fetch_add_explicit(&static_index, 1, memory_order_relaxed);

(Or if you want, idx = static_index++; but that defaults to seq_cst so would compile less efficiently for non-x86 ISAs.)

You don't need volatile _Atomic so you can drop the volatile type qualifier. Using volatile for hand-rolled relaxed atomics is generally not recommended now that C11 / C++11 is available, but if you do then plain load/store access to volatile is kind of like _Atomic with mo_relaxed.

Got it clear, thank you! Should I make my `_Atomic uint64_t static_index` `static` in this case? Or `_Atomic` automatically implies `static`? — Netherwire, Jan 31 '20 at 08:33
@Netherwire: static and `_Atomic` are orthogonal; by the time I got to that part of the answer I forgot your shared var was only function-scoped `static`, not global, no other reason for omitting it :P. Fixed. The use-case for a non-static `_Atomic` in automatic storage would be if you took its address and passed a pointer to other threads. — Peter Cordes, Jan 31 '20 at 09:15

Should I use a barrier while accessing statically initialized variable?

1 Answers1