28

For example:

int foo()
{
    static int i = 0;
    return i++;
}

The variable i will only be initialized to 0 the first time foo is called. Does this automatically mean there's a hidden branch in there to keep the initialization from happening more than once? Or are there more clever tricks to avoid this?

Borgleader
  • 15,826
  • 5
  • 46
  • 62
  • Possible duplicate of [When do function-level static variables get allocated/initialized?](http://stackoverflow.com/questions/55510/when-do-function-level-static-variables-get-allocated-initialized) – Cameron May 23 '14 at 12:41
  • @cameron That question is asking when, this question is asking how. – JBentley May 23 '14 at 12:55
  • 1
    For my compiler in a quick test, this did *NOT* generate any code to initialize i at runtime, it placed the variable in a memory location with the value loaded from the executable. Although the compiler has to act as if it initializes the value first time it is called, it does not actually have to do that if there are no visible side effects. Where it's a basic type with no constructor it can likely avoid the entire overhead. – jcoder May 23 '14 at 13:09
  • @jcoder Modern C++ with constexpr could initialize even mor complex objects at compile time, if I'm not mistaken. What would effectively prevent that would be run-time dependent data in the initialization code, probably effectively anything that could not be declared constexpr. – Peter - Reinstate Monica Feb 21 '19 at 11:45

2 Answers2

22

Yes, it must incur a branch, and it must also incur at least an atomic operation for safe concurrent initialization. The Standard requires that they are initialized on function entry, in a concurrency-safe way.

The implementation can only dodge this requirement if it can prove that the difference between lazy init and some earlier initialization like before main() is entered is equivalent. For example, simple PODs initialized from constants, the compiler may choose to initialize it earlier like a file-scope global since it's non-observable and saving the lazy initialization code, but that's a non-observable optimization.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Interesting, so C is guaranteed to be more efficient than C++ here, because static locals can be initialized with compile-time constants only in C. – fredoverflow May 23 '14 at 13:33
  • They'd probably just use a file-scope variable instead. C isn't more efficient. – Puppy May 23 '14 at 13:38
  • 2
    @FredOverflow not at all; if a static local would be valid in C then the C++ implementation is permitted to initialize it early (and in practice using precisely the same mechanism i.e. `.data` segment). – ecatmur May 23 '14 at 13:39
  • @ecatmur Yes, *permitted*, but you cannot rely on it. – fredoverflow May 23 '14 at 13:39
  • @FredOverflow: No more than you can in C. – Puppy May 23 '14 at 13:42
  • 1
    @FredOverflow that's QOI for you; but it's very easy for a C++ implementation to get right. You can't conclude that C is guaranteed to be more efficient than C++; rather that C is guaranteed to be *at least as* efficient as C++. – ecatmur May 23 '14 at 13:45
  • 1
    That's not really true. C simply doesn't permit the slow cases. For the comparable cases where both languages permit it, they are equally fast. For the other cases, C++ is already the winner because C flat out doesn't have that feature, regardless of performance. – Puppy May 23 '17 at 14:15
21

Yes, there is a branch. Each time the function is entered, the code must check if the variable has already been initialized. But as will be explained below, you usually do not have to care about this branch.

Example

Check out this code:

#include <iostream>

struct Foo { Foo(){ std::cout << "FOO" << std::endl;} };
void foo(){ static Foo foo; }
int main(){ foo();}

Now, here is the first part of assembly code that gcc4.8 generates for the foo function:

_Z3foov:
.LFB974:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
.cfi_lsda 0x3,.LLSDA974
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq    %rsp, %rbp
.cfi_def_cfa_register 6
pushq   %r12
pushq   %rbx
.cfi_offset 12, -24
.cfi_offset 3, -32
movl    $_ZGVZ3foovE3foo, %eax
movzbl  (%rax), %eax
testb   %al, %al
jne .L7                     <------------------- FIRST CHECK
movl    $_ZGVZ3foovE3foo, %edi
call    __cxa_guard_acquire <------------------- LOCK    
testl   %eax, %eax
setne   %al
testb   %al, %al
je  .L7                     <------------------- SECOND CHECK
movl    $0, %r12d
movl    $_ZZ3foovE3foo, %edi

A you see, there is a jne! Then, a guard is aquired using __cxa_guard_acquire, followed by a je. Thus, it seems that the compiler is generating the famous double checked locking pattern here.

Will every compiler generate a branch?

I am pretty sure the spec does NOT mandate that a branch or double checked locking must be used. It just mandates that the initialization must be thread safe. However, I do not see a way to perform a thread safe initialization without a branch. Thus, even though the spec does not mandate it, it is simply not possible with current CPU architectures to omit the branch here.

Is the branch expensive?

Considering whether you should care about this branch: You should definitly NOT care about this branch, since it will be correctly predicted (as it once the object is initialized the branch always takes the same route). Thus, the branch is almost free. Trying to avoid a static local variable for optimization purposes should never yield any observable performance benefit.

Is there really no way around the branch?

If the constructor is not observable, like simply initialization with constant values, then it may be performed eagerly at program startup and the branch is omitted. If, however, it is observable, then things get pretty tricky:

The only possibility I see is stated in the answer of R. Martinho Fernandes (which has been deleted): The code could modify itself. I.e., simply remove the initialization code once the initialization is done. However, this is idea is impractical for the following reasons:

  1. Self-modifying code is very hard to get thread-safe.
  2. Usually, memory flagged executable is write protected so code is not allowed to rewrite itself.
  3. It is just not worth it, as the branch is not expensive (see above).
Community
  • 1
  • 1
gexicide
  • 38,535
  • 21
  • 92
  • 152
  • While it's beneficial to show an assembly example, it's not a definitive answer on its own. – Bartek Banachewicz May 23 '14 at 12:47
  • I downvoted for presenting compiler-specific assembly results as evidence, instead of referring to the actual language rule. – Puppy May 23 '14 at 12:49
  • 4
    @Puppy Posting what gcc does amounts to saying "usually" which is extremely helpful in practice. – Peter - Reinstate Monica Feb 21 '19 at 11:52
  • I was using static variables excessively to avoid this branch. Now I must avoid them as they can even cause undefined behaviour, based on how they are allowed to be implemented. Just another worthless Cpp tool to help shoot yourself in the foot. – user13947194 Oct 21 '21 at 22:27
  • @user13947194 What behaviour of static locals is undefined? – Chuu Feb 07 '23 at 19:39