6

C++, unlike some other languages, allows static data to be of any arbitrary type, not just plain-old-data. Plain-old-data is trivial to initialize (the compiler just writes the value at the appropriate address in the data segment), but the other, more complex types, are not.

How is initialization of non-POD types typically implemented in C++? In particular, what exactly happens when the function foo is executed for the first time? What mechanisms are used to keep track of whether str has already been initialized or not?

#include <string>
void foo() {
    static std::string str("Hello, Stack Overflow!");
}
Paul Manta
  • 30,618
  • 31
  • 128
  • 208
  • implementation detail. one possible variant is to move the function pointer to after the initialization. – sp2danny Jul 04 '14 at 07:22
  • @sp2danny I suspected it's an implementation detail. That's why I asked how it's "typically" implemented. :) **About moving the function pointer:** I thought about that, but it seems more likely to me that a jump instruction would be inserted at the beginning of the function, that jumps over the initialization. This way pointers to the function would still be valid, even if the function was called before or not. – Paul Manta Jul 04 '14 at 07:24

4 Answers4

6

C++11 requires the initialization of function local static variables to be thread-safe. So at least in compilers that are compliant, there'll typically be some sort of synchronization primitive in use that'll need to be checked each time the function is entered.

For example, here's the assembly listing for the code from this program:

#include <string>
void foo() {
    static std::string str("Hello, Stack Overflow!");
}

int main() {}

.LC0:
    .string "Hello, Stack Overflow!"
foo():
    cmpb    $0, guard variable for foo()::str(%rip)
    je  .L14
    ret
.L14:
    pushq   %rbx
    movl    guard variable for foo()::str, %edi
    subq    $16, %rsp
    call    __cxa_guard_acquire
    testl   %eax, %eax
    jne .L15
.L1:
    addq    $16, %rsp
    popq    %rbx
    ret
.L15:
    leaq    15(%rsp), %rdx
    movl    $.LC0, %esi
    movl    foo()::str, %edi
    call    std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
    movl    guard variable for foo()::str, %edi
    call    __cxa_guard_release
    movl    $__dso_handle, %edx
    movl    foo()::str, %esi
    movl    std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string(), %edi
    call    __cxa_atexit
    jmp .L1
    movq    %rax, %rbx
    movl    guard variable for foo()::str, %edi
    call    __cxa_guard_abort
    movq    %rbx, %rdi
    call    _Unwind_Resume
main:
    xorl    %eax, %eax
    ret

The __cxa_guard_acquire, __cxa_guard_release etc. are guarding initialization of the static variable.

Praetorian
  • 106,671
  • 19
  • 240
  • 328
1

The implementation that I've seen uses a hidden boolean variable to check if the variable is initialized. Modern compiler will do this thread-safely, but IIRC, some older compilerd did not do that, and if it was called from several threads at the same time you could get the constructor called twice.

Something along the lines of:

static bool __str_initialized = false;
static char __mem_for_str[...]; //std::string str("Hello, Stack Overflow!");

void foo() {
    if (!__str_initialized)
    {
        lock();
        __str_initialized = true;
        new (__mem_for_str) std::string("Hello, Stack Overflow!");
        unlock();
    }
}

Then, in the finalization code of the program:

if (__str_initialized)
     ((std::string&)__mem_for_str).~std::string();
rodrigo
  • 94,151
  • 12
  • 143
  • 190
0

It's implementation specific.

Typically, there'll be a flag (statically initialised to zero) to indicate whether it's initialised, and (in C++11, or earlier thread-safe implementations) some kind of mutex, also statically initialisable, to protect against multiple threads trying to in initialise it.

The generated code would typically behave something along the lines of

static __atomic_flag_type __initialised = false;
static __mutex_type __mutex = __MUTEX_INITIALISER;

if (!__initialised) {
    __lock_type __lock(__mutex);
    if (!__initialised) {
        __initialise(str);
        __initialised = true;
    }
}
Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • Funny to see an explanation about static initialization using other static :) – Jarod42 Jul 04 '14 at 07:44
  • @Jarod42: Indeed. That's where it's important to understand the various overloaded meanings of "static" in C++; in particular the difference between static storage duration and static initialisation. – Mike Seymour Jul 04 '14 at 07:56
0

You can check what your compiler does by generating an assembler listing.

MSVC2008 in debug mode generates this code (excluding exception handling prolog/epilog etc):

    mov eax, DWORD PTR ?$S1@?1??foo@@YA_NXZ@4IA
    and eax, 1
    jne SHORT $LN1@foo
    mov eax, DWORD PTR ?$S1@?1??foo@@YA_NXZ@4IA
    or  eax, 1
    mov DWORD PTR ?$S1@?1??foo@@YA_NXZ@4IA, eax
    mov DWORD PTR __$EHRec$[ebp+8], 0
    mov esi, esp
    push    OFFSET ??_C@_0BH@ENJCLPMJ@Hello?0?5Stack?5Overflow?$CB?$AA@
    mov ecx, OFFSET ?str@?1??foo@@YA_NXZ@4V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A
    call    DWORD PTR __imp_??0?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@QAE@PBD@Z
    cmp esi, esp
    call    __RTC_CheckEsp
    push    OFFSET ??__Fstr@?1??foo@@YA_NXZ@YAXXZ   ; `foo'::`2'::`dynamic atexit destructor for 'str''
    call    _atexit
    add esp, 4
    mov DWORD PTR __$EHRec$[ebp+8], -1
$LN1@foo:

i.e there is a static variable referenced by ?$S1@?1??foo@@YA_NXZ@4IA this is checked to see if it & 1 is zero. if not it branches to the label $LN1@foo:. Otherwise it or's in 1 to the flag, constructs the string at a known location and then adds a call for its destructor at program exit using 'atexit'. Then continues the function as normal.

Pete
  • 4,784
  • 26
  • 33