The relevant thing isn't being a class type or not, it's compile-time evaluation of the initializer (at the current optimization level). And of course the constructor not having any side-effects, if it's non-trivial.
If it's not possible to simply put a constant value in .data
, gcc/clang use an acquire load of a guard variable to check that static locals have been initialized. If the guard variable is false, then they pick one thread to do the initializing, and have other threads wait for it if they also see a false guard variable. They've been doing this for a long time, since before C++11 required it. (e.g. as old as GCC4.1 on Godbolt, from May 2006.)
The most simple artificial example, snapshotting the arg from the first call and ignoring later args:
int foo(int a){
static int x = a;
return x;
}
Compiles for x86-64 with GCC11.3 -O3 (Godbolt), with the exact same asm generated for -std=gnu++03
mode. GCC4.1 also makes about the same asm, but doesn't keep the push/pop off the fast path (i.e. missing shrink-wrap optimization). GCC4.1 only supported AT&T syntax output, so it visually looks different unless you flip modern GCC to AT&T mode as well, but this is Intel syntax (destination on the left).
# demangled asm from g++ -O3
foo(int):
movzx eax, BYTE PTR guard variable for foo(int)::x[rip] # guard.load(acquire)
test al, al
je .L13
mov eax, DWORD PTR foo(int)::x[rip] # normal load of the static local
ret # fast path through the function is the already-initialized case
.L13: # jumps here on guard == 0, on the first call (and any that race with it)
# It would be sensible for GCC to put this code in .text.cold
push rbx
mov ebx, edi # save function arg in a call-preserved reg
mov edi, OFFSET FLAT:guard variable for foo(int)::x # address
call __cxa_guard_acquire # guard_acquire(&guard_x) presumably a normal mutex or spinlock
test eax, eax
jne .L14 # if (we won the race to do the init work) goto .L14
mov eax, DWORD PTR foo(int)::x[rip] # else it's done now by another thread
pop rbx
ret
.L14:
mov edi, OFFSET FLAT:guard variable for foo(int)::x
mov DWORD PTR foo(int)::x[rip], ebx # init static x (from a saved in RBX)
call __cxa_guard_release
mov eax, DWORD PTR foo(int)::x[rip] # missed optimization: mov eax, ebx
# This thread is the one that just initialized it, our function arg is the value.
# It's not atomic (or volatile), so another thread can't have set it, too.
pop rbx
ret
If compiling for AArch64, the load of the guard variable is ldarb w8, [x8]
, a load with acquire semantics. Other ISAs might need a plain load and then a barrier to give at least LoadLoad ordering, to make sure they load the payload x
no earlier than when they saw the guard variable being non-zero.
If the static
variable has a constant initializer, no guard is needed
int bar(int a){
static int x = 1;
return ++x + a;
}
bar(int):
mov eax, DWORD PTR bar(int)::x[rip]
add eax, 1
mov DWORD PTR bar(int)::x[rip], eax # store the updated value
add eax, edi # and add it to the function arg
ret
.section .data
bar(int)::x:
.long 1