83

In C/C++, why are globals and static variables initialized to default values?

Why not leave it with just garbage values? Are there any special reasons for this?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Xinus
  • 29,617
  • 32
  • 119
  • 165

5 Answers5

90
  1. Security: leaving memory alone would leak information from other processes or the kernel.

  2. Efficiency: the values are useless until initialized to something, and it's more efficient to zero them in a block with unrolled loops. The OS can even zero freelist pages when the system is otherwise idle, rather than when some client or user is waiting for the program to start.

  3. Reproducibility: leaving the values alone would make program behavior non-repeatable, making bugs really hard to find.

  4. Elegance: it's cleaner if programs can start from 0 without having to clutter the code with default initializers.

One might then wonder why the auto storage class does start as garbage. The answer is two-fold:

  1. It doesn't, in a sense. The very first stack frame page at each level (i.e., every new page added to the stack) does receive zero values. The "garbage", or "uninitialized" values that subsequent function instances at the same stack level see are really the previous values left by other method instances of your own program and its library.

  2. There might be a quadratic (or whatever) runtime performance penalty associated with initializing auto (function locals) to anything. A function might not use any or all of a large array, say, on any given call, and it could be invoked thousands or millions of times. The initialization of statics and globals, OTOH, only needs to happen once.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
  • 8
    I guess the asker wants to know why `static int x;` always make `x` initialized to zero while `int x;` leaves `x` being garbage. – kennytm Jan 19 '10 at 06:19
  • Hmm, you might be right, I've revised the answer to address that. – DigitalRoss Jan 19 '10 at 06:24
  • 3
    Actually the real reason is that the original C standard was to codify existing practice rather than introduce new stuff. And pre-ANSI/ISO C did it for efficiency. – paxdiablo Jan 19 '10 at 06:37
  • "The very first stack frame does receive zero values". I'm not even sure what that means. Where does that idea come from? – Michael Burr Jan 19 '10 at 07:13
  • @michael Burr: do you really think that pages allocated for the stack are not zeroed? Really? Of course they are. The stack does start as demand paged zero-fill pages. Because any given address is used multiple times by repeated instances of functions at the same stack level, the values appear random. But they are not, and they will be generally the same "garbage" values in repeated runs if the arguments and environment are not changed. This is complicated somewhat by modern OS security concerns that change addresses of load points to fight stack-smashing attacks, but that's another story. – DigitalRoss Jan 19 '10 at 07:28
  • 4
    DigitalRoss: The "very first stack frame" is often not used by code that you've written, but by the program prologue and initialisation routines for the standard library. And not every OS zeroes stack (or heap!) memory before program start - for example MS-DOS does (did?) not. – caf Jan 19 '10 at 11:08
  • 1
    @caf, of course. But from the kernel's point of view and from a security point of view, it's still your program, and those pages really did start as zeroed pages. Sure, by the time you get to main(), maybe it doesn't look that way, but new stack levels you reach will be zeroed the first time you reach them and cross into a new page that the kernel must allocate. – DigitalRoss Jan 19 '10 at 16:08
  • @paxdiablo, I guess I agree in part. Certainly C89's (somewhat ignored) mandate was to "codify existing practice", but in this case the practice does seem to make perfect sense... – DigitalRoss Jan 19 '10 at 16:10
  • 4
    @DigitalRoss: even if you're on a platform that provides zero-filled pages for the stack (which is very common, but not necessarily universal), that doesn't mean that the first time a function is called that the locals will end up in zero-filled memory. For example, when `main()` is called, there's no telling what the runtime has done to the stack before that point. Not to mention that a local variable might not even end up on the stack (it might exist solely in a register for example). That a stack might be zero-initialized is meaningless information as far as C/C++ locals are concerned. – Michael Burr Jan 19 '10 at 16:10
  • 1
    @Michael Burr: I never said anything about "the first time a function is called", instead I was discussing the first use of new levels on the stack. I agree that this information isn't much use when programming, as you must always assume that locals can have any bit pattern. My point was to explain how things worked, and I believe everything I said was accurate... – DigitalRoss Jan 20 '10 at 07:52
  • @DigitalRoss: What is "unrolled loop" you mentioned in 2nd point of efficiency? Thanks – Destructor May 17 '15 at 06:29
  • @PravasiMeet, see https://en.wikipedia.org/wiki/Loop_unrolling -- it's something that's easier to do at the OS level where entire pages are zeroed at a time, and the code only needs to be written and optimized once. – DigitalRoss Oct 08 '15 at 18:28
  • 2
    Nope. Sorry: 1.: Wrong. Heap/stack is uninitialized and does not leak anything. 2.: Wrong and wrong: BSS init takes time and vals not useless. 3.+4.: Agreed. Second list: 1.: Wrong. Stack is not guaranteed to start with zero at all. 2. Agreed, but not an answer to the question. This is exactly the reason _not_ to initialize memory in C/C++: Runtime overhead is given higher priority than security/safety/elegance in C/C++. And for globals this runtime overheads is negligible. _This_ is the answer to the question: It comes with very little cost and a huge benefit and so it was specified this way. – Johannes Overmann Feb 08 '19 at 14:06
  • @DigitalRoss, can you extend the list with the issue of multiple libraries initialization. When an application consist of multiple libraries, the order of runtime constructors is not defined and some libraries can call other libraries even before the later called constructors (the situation even worse with cyclic dependencies). Zero/value initialization helps to solve this issue because it happens before any constructor is called. Therefore, uninitialized library can safely check its state, e.g. it is not initialized yet and perform the initialization. – Alex Apr 20 '21 at 19:41
  • Herr Overmann, OS requirements interact with language requirements and the result on all modern systems is the behavior I've outlined. (And it has been years but I seem to recall that I clarified the stack frame note twice as it's occasionally misunderstood.) Think about my answer and imagine how a system could be secure and do it any other way. Write some code if you think you can prove me wrong. – DigitalRoss Apr 22 '21 at 04:20
32

Because with the proper cooperation of the OS, 0 initializing statics and globals can be implemented with no runtime overhead.

R Samuel Klatchko
  • 74,869
  • 16
  • 134
  • 187
  • and also static and global variables are initialized before our code start execution and hence more or less responsibility of mainCRTStartup() to initialize it.. – Bhupesh Pant Oct 02 '13 at 18:34
  • 2
    There is runtime overhead (the OS has to fill the pages with zeroes), but it's needed for security reasons anyway. – user253751 Feb 17 '20 at 18:02
  • 1
    This is wrong. Please explain how zero initializing can be done without runtime overhead. Do you consider start up as not being run time? – robsn Jun 05 '20 at 09:44
18

Section 6.7.8 Initialization of C99 standard (n1256) answers this question:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:

— if it has pointer type, it is initialized to a null pointer;

— if it has arithmetic type, it is initialized to (positive or unsigned) zero;

— if it is an aggregate, every member is initialized (recursively) according to these rules;

— if it is a union, the first named member is initialized (recursively) according to these rules.

Jingguo Yao
  • 7,320
  • 6
  • 50
  • 63
8

Think about it, in the static realm you can't tell always for sure something is indeed initialized, or that main has started. There's also a static init and a dynamic init phase, the static one first right after the dynamic one where order matters.

If you didn't have zeroing out of statics then you would be completely unable to tell in this phase for sure if anything was initialized AT ALL and in short the C++ world would fly apart and basic things like singletons (or any sort of dynamic static init) would simple cease to work.

The answer with the bulletpoints is enthusiastic but a bit silly. Those could all apply to nonstatic allocation but that isn't done (well, sometimes but not usually).

  • The point about singletons at first sounds compelling, but I'm not sure it actually has any relevance: How compilers store their 'already created y/n' flags is an implementation detail, and they'd surely be free only to zero _those_ prior to startup. The rest is overly flippant without really explaining itself. Also: `the static one first right after the dynamic one` whuh? is the static one `first`? or is it `right after the dynamic one`? and what's the difference? Also, order of declaration matters in all cases where objects depend upon each other, regardless of storage duration, does it not? – underscore_d Jan 24 '17 at 20:45
2

In C, statically-allocated objects without an explicit initializer are initialized to zero (for arithmetic types) or a null pointer (for pointer types). Implementations of C typically represent zero values and null pointer values using a bit pattern consisting solely of zero-valued bits (though this is not required by the C standard). Hence, the bss section typically includes all uninitialized variables declared at file scope (i.e., outside of any function) as well as uninitialized local variables declared with the static keyword.

Source: Wikipedia