When exactly is stack allocated

Question

Even in C (not just C++) you can declare variables at the start of a code block, which is enclosed in curly braces.

Example:

#include <stdio.h>
void use_stack(int cnt)
{
    if (cnt<=16) {
        int a[16];
        int i;
        a[0]=3;
        a[1]=5;
        for (i=2;i<cnt;i++) {
            a[i]=a[i-1]+a[i-2];
        }
        printf("a[%d] == %d\n",cnt-1,a[cnt-1]);
    }
    else {
        printf("cnt is too big\n");
    }
}

Now I know that variables like the array a[16] are allocated on the stack in this case.

I was wondering if the space for this array is allocated at the start of the function (first opening curly brace) or at the start of the block where it is declared (opening curly brace after if).

From examining the assembler code it seems the compiler allocates the space for a[16] directly at the entry of the function.

I actually expected that the stack would be allocated (stack pointer decreased) at the declaration of a[16] and that the stack would be de-allocated (stack pointer increased) at the end of the corresponding if code block.

But this does not happen it seems (stack for a[16] is allocated directly at function entry, even if a[16] is not used in the else branch).

Has anyone an explanation why this is the case ?

So is there any part of the C language standard, which explains this behavior, or does it have to do with things like "longjmp" or signal handling, which maybe require that the stack pointer is "constant" inside a function ?

Note: The reason why I assumed the stack would be allocated/deallocated at the start/end of the code block is, because in C++ the constructor/destructor of objects allocated on the stack will be called at the start/end of the code block. So if you examine the assembler code of a C++ program you will notice that the stack is still allocated at the function entry; just the constructor/destructor call will be done at the start/end of the code block.

I am explicitly interested why stack is not allocated/deallocated at the start/end of a code block using curly braces.

The question: At what exact moment is a local variable allocated storage? is only about a local variable allocated at the start of a function. I am surprised that stack allocation for variables allocated later inside a code block is also done at the function entry.

So far the answers have been:

Something to do with optimization
Might different for C, C++
Stack is not even mentioned in the C language specification

So: I am interested in the answer for C... (and I strongly believe that the answer will apply to C++ also, but I am not asking about C++ :-)).

Optimization: Here is an example which will directly demonstrate why I am so surprised and why I am quite sure that this is not an optimization:

#include <stdio.h>

static char *stackA;
static char *stackB;

static void calc(int c,int *array)
{
    int result;
    if (c<=0) {
        // base case c<=0:
        stackB=(char *)(&result);
        printf("stack ptr calc() = %p\n",stackB);
        if (array==NULL) {
            printf("a[0] == 1\n");
        } else {
            array[0]=1;
        }
        return;
    }

    // here: c>0
    if (array==NULL) {
        // no array allocated, so allocate it now
        int i;
        int a[2500];

        // calculate array entries recursively
        calc(c-1,a);

        // calculate current array entry a[c]
        a[c]=a[c-1]+3;

        // print full array
        for(i=0;i<=c;i++) {
            printf("a[%d] = %d\n",i,a[i]);
        }
    } else {
        // array already allocated
        calc(c-1,array);

        // calculate current array entry a[c]
        array[c]=array[c-1]+3;
    }
}

int main()
{
    int a;
    stackA=(char *)(&a);
    printf("stack ptr main() = %p\n",stackA);
    calc(9,NULL);
    printf("used stack = %d\n",(int)(stackA-stackB));
}

I am aware that this is an ugly program :-).

The function calc calculates n*3 + 1 for all 0<=n<=c in a recursive fashion.

If you look at the code for calc you notice that the array a[2500] is only declared when the input parameter array to the function is NULL.

Now this only happens in the call to calc which is done in main.

The stackA and stackB pointers are used to calculate a rough estimate how much stack is used by this program.

Now: int a[2500] should consume around 10000 bytes (4 bytes per integer, 2500 entries). So you could expect that the whole program consumes around 10000 bytes of stack + something additional (for overhead when calc is called recursively).

But: It turns out this program consumes around 100000 bytes of stack (10 times as much as expected). The reason is, that for each call of calc the array a[2500] is allocated, even if it is only used in the first call. There are 10 calls to calc (0<=c<=9) and so you consume 100000 bytes of stack.

It does not matter if you compile the program with or without optimization
GCC-4.8.4 and clang for x64, MS Visual C++ 2010, Windriver for DIAB (for PowerPC) all exhibit this behavior

Even weirder: C99 introduces Variable Length Arrays. If I replace int a[2500]; in the above code with int a[2500+c]; then the program uses less stack space (around 90000 bytes less).

Note: If I only change the call to calc in main to calc(1000,NULL); the program will crash (stack overflow == segmentation fault). If I additionally change to int a[2500+c]; the program works and uses less than 100KB stack. I still would like to see an answer, which explains why a Variable Length Array does not lead to a stack overflow whereas a fixed length array does lead to a stack overflow, in particular since this fixed length array is out of scope (except for the first invocation of calc).

So what's the reason for this behavior in C ?

I do not believe that GCC/clang both simply are not able to do better; I strongly believe there has to be a technical reason for this. Any ideas ?

Answer by Google

After more googling: I strongly believe this has something to do with "setjmp/longjmp" behavior. Google for "Variable Length Array longjmp" and see for yourself. It seems longjmp is hard to implement if you do not allocate all arrays at function entry.

It is an optimization. If you allocate space for all variables at function entry you do not need to modify stack pointer during the whole lifetime of the stackframe. In C++ the situation is a bit more complex, because constructors and destructors shall be called at right places, hovewer even there the space for variables can be "pre-allocated" at function entry. — Marian, Dec 05 '16 at 12:18
Given that C and C++ have (completely) different compilers (since they're different languages!), your answer will differ vastly depending on which language the answer targets. Please pick one. — erip, Dec 05 '16 at 12:19
The space on the stack is reserved before you need it and it's unreserved after you don't need it anymore. The rest is implementation details that can change with the mood of your compiler, compiler options, debugger support, etc. — Art, Dec 05 '16 at 12:20
And even the existence of the stack is an implementation detail. C standard only speaks about *'Storage durations of objects'*, and doesn't care how it's implemented. — user694733, Dec 05 '16 at 12:23
@erip: Actually, it is usually *not* true that C and C++ have completely different compilers. They usually share a back-end which does code generation - and it is usually the back end which handles this. Even if it were true, "compare and contrast the behaviour of C and C++" is a legitimate use for both tags on a single question. — Martin Bonner supports Monica, Dec 05 '16 at 12:28
@erip: Thanks for finding the previous question, I was looking for an answer but did not know what to look for. I picked C (that's what the example is written for). I tried this with: GCC 4.8.4 (for PowerPC + x64), Clang-3.8 (for x64), Windriver DIAB compiler (5.8). ALL OF THEM allocate the storage for for `a[16]` at the function entry. It also does not matter which optimization option I use. So I very much doubt that this is just coincidence. I believe that there has to be a reason... (Imagine I would use a[100000]). — Ingo Blackman, Dec 05 '16 at 12:31
@IngoBlackman: Coincidence? Possibly not. But not a general rule either. — too honest for this site, Dec 05 '16 at 13:40
@Olaf: So in the 2nd example I posted 90000 bytes of stack are wasted, because the compilers are too dumb ? I cannot believe that. — Ingo Blackman, Dec 05 '16 at 13:54
@IngoBlackman: Please provide a reference to the C standard where it: 1) requires a stack, 2) says the stack has a certain size, 2) enforces any specific memory alocation technique for variables (automatic, static or dynamic) or a specific memory scheme at all. Or that your environment actually reserves the physical memory at all. No modern OS will do this by default. — too honest for this site, Dec 05 '16 at 13:57
@IngoBlackman You keep repeating that. The compilers are not dumb. They are smart. They generate code relatively quickly that will perform well for the largest amount of applications. You can always write code that will trip up a compiler and make it generate suboptimal code for some particular obscure case. In your case if you reported this to a compiler maker either they would tell you to stop writing bad code or there might be some flag that makes the compiler generate slow code just for people who have written bad code. Besides. 90k of stack is nothing. — Art, Dec 05 '16 at 13:57
I wrote that this also happens when I do **not** enable optimization. In this case compilers write tons of code which is removed by optimization. To find out how much stack you need to reserve it seems compilers need to parse the whole function just to find out how much stack space you need in total. This is **way** more complicated then just allocating the stack space when you see the definition of a local variable and removing the stack space once a local variable goes out of scope. So you are telling me compilers spent this extra effort even when producing non-optimized code ? Why ? — Ingo Blackman, Dec 05 '16 at 14:37
@art: Change the above program to use `int a[25000]`. If stack would only be allocated when the corresponding `if` block is executed the whole program would take around 100k of stack. "nothing" according to you. But the way compilers behave you reserve around 1MByte of stack. Still "nothing" ? And if you change the call in `main` to `calc(99,NULL)` you now need 10MB of stack. Still "nothing" ? (note only 100k are necessary...) — Ingo Blackman, Dec 05 '16 at 14:39
@IngoBlackman Change the program to use `int a[200000000]` and it will crash. So what? Play stupid games win stupid prizes. Don't put too much stuff on the stack, especially when you're recursing. Everyone knows that. And when it comes to "This is way more complicated ...", no, it isn't. A compiler doesn't read one statement and generate the instructions needed for that one statement. Compilers haven't worked like that for the past 20-30 years. Regardless of enabling or disabling optimizations. — Art, Dec 05 '16 at 14:48
@art: But I am exactly **not** using `int a[]`. I only declare `a` when the parameter to `calc` is NULL. So why does the compiler allocate stack for an array which is **not** used ? I am trying exactly to **not** put too much stuff on the stack by using a declaration which is only valid inside a pretty short scope. But it seems the compiler allocates space for that array even when it is **out of scope** why ? — Ingo Blackman, Dec 05 '16 at 15:14
@art: If in the above program I replace `int a[2500];` by "int a[c+2500];` it uses **less** stack (about 90000 bytes less). Can you think of any logical explanation, why that should be the case ? — Ingo Blackman, Dec 05 '16 at 15:27
@IngoBlackman It uses less stack because the compiler cannot pre-determine the amount of stack space required for VLAs, so has to generate the code to allocate the the VLA on demand. — Ian Abbott, Dec 05 '16 at 15:43
@Ian: So when not using VLAs, why does the compiler allocate stack for a variable which is out of scope ? Where is the logic behind this. — Ingo Blackman, Dec 05 '16 at 16:17
@IngoBlackman Barring pathological cases such as this, it's generally more efficient to allocate a pre-calculated amount of stack space. You may think the compiler should always bend to your will, but sometimes you need to bend to the compiler's will. :) — Ian Abbott, Dec 05 '16 at 16:58
@Ian: Ok, let's assume that I replace the call in `main` with `calc(1000,NULL);`. If I use the program as written in my post it crashes. If I additionally change the program to use `int a[c+2500];` it works and consumes about 100KB of stack (not that much). And the only explanation for that is: Well the compiler might not do what you want ? — Ingo Blackman, Dec 05 '16 at 18:50
@Ingo What most compilers actually do is pre-calculate the maximum size of the stack-frame it may need to implement the statically-sized auto storage class variables (plus any other temporary storage for registers and intermediate results), and generate code to allocate that storage at the start of the function, and remove it at the end. Dynamically-sized variables such as VLAs do not fit in this scheme, so have to be created specially as required. Yes, the compiler might not do what you want. It generally assumes it will not run out of stack, so you may need to refactor your function to fit. — Ian Abbott, Dec 06 '16 at 10:19

score 5 · Answer 1 · answered Dec 05 '16 at 12:24

5

The language rules for automatic storage only guarantees that the last allocated is the first deallocated.

A compiler can implement this logical stack any way it sees fit.

If it can prove that a function isn't recursive it can even allocated the storage at program start-up.

I believe that the above applies to C as well as C++, but I'm no C expert.

Please, when you ask about the details of a programming language, limit the question to one language at a time.

answered Dec 05 '16 at 12:24

Cheers and hth. - Alf

142,714
15
209
331

I tried this with: GCC 4.8.4 (for PowerPC + x64), Clang-3.8 (for x64), Windriver DIAB compiler (5.8). ALL OF THEM allocate the storage for for a[16] at the function entry. It also does not matter which optimization option I use. I very much doubt that this is just coincidence. I believe that there has to be a reason why the compilers do it this way. Imagine I would use `a[100000]`. Why should I reserve so much stack, even if I am not going to use it (because I execute the `else` branch). – Ingo Blackman Dec 05 '16 at 12:33
@IngoBlackman I just tried this on goldbolt with gcc 4.8.4 and -O3, and the stack is not "allocated" at the start of the function - There are probably practical reasons why compilers do stuff one way or another, but as far as the standard is concern, there is no rules except the one specified by Cheers. – Holt Dec 05 '16 at 12:41
@Holt: Try out the 2nd program I posted. I compile this with gcc 4.8.4 and -O3 under Ubuntu 14.04 (x64). One of the first instructions in the `calc` function is `subq $10040, %rsp`; so the stack pointer is decreased by 10040 bytes; that's where the `int a[2500]` array is "allocated". – Ingo Blackman Dec 05 '16 at 13:35

score 1 · Answer 2 · answered Dec 05 '16 at 13:50

1

There is no technical reason for this other than choices that compiler makers made. It's less generated code and faster executing code to always reserve all the stack space we'll need at the beginning of the function. So all the compilers made the same reasonable performance tradeoff.

Try using a variable length array and you'll see that the compiler is fully capable of generating code that "allocates" stack just for a block. Something like this:

void
foo(int sz, int x)
{
        extern void bar(char *);
        if (x) {
                char a[sz];
                bar(a);
        } else {
                char a[10];
                bar(a);
        }
}

My compiler generates code that always reserves stack space for the x is false part, but the space for the true part is only reserved if x is true.

answered Dec 05 '16 at 13:50

Art

19,807
1
34
60

Interesting: I still don't understand why the `x == false` branch is allocated all the time. Why should you do that ? – Ingo Blackman Dec 05 '16 at 14:15
@IngoBlackman I suspect that's because the compiler always generates the same code for all functions. Sum up sizes of all objects used in the function, generate one instruction to reserve that much stack, spend time on more interesting problems. As you have demonstrated this optimization/laziness can sometimes generate suboptimal code. Generating the optimal code for all code in all cases is impossible because (unless I'm very confused) it would require solving the halting problem. – Art Dec 05 '16 at 14:23
After googling more: I think the real reason why compilers do that, has something to do with "setjmp/longjmp". Google for "Variable Length Array longjmp" and it seems that there might be problems with longjmp and variable length arrays. I think these problems are the real reason why it seems non variable length arrays are allocated at the entry of a function. – Ingo Blackman Dec 05 '16 at 14:27
@IngoBlackman Why would there be problems with `longjmp`? I googled that and the only thing I found was that VLAs and logjmp had problems 15 years ago on some compiler because it implemented VLAs on the heap. Which is not what happens here. – Art Dec 05 '16 at 14:42
Not sure why there should be problems with `longjmp`: My guess, because when returning to the `setjmp` you would need to know which arrays have been allocated. This might be problematic if you did not allocate all local variables a function needs at the entry of the function. Not sure though; at least I believe if you save the stack pointer setjmp should just work fine. OTOH the web sites you found mention exactly that for VLAs none of this is guaranteed; which means that you might not implement VLAs in such a way that they work together with setjmp/longjmp... – Ingo Blackman Dec 05 '16 at 15:18

eerorika · Answer 3 · 2016-12-05T13:54:53.767

0

Alf has explained the limitations and freedoms that the standard specifies (or rather, what it doesn't specify).

I'll suggest an answer to the question

why stack is not allocated/deallocated at the start/end of a code block using curly braces.

The compilers that you tested chose to do so (I'm actually just guessing, I didn't write any of them), because of better run-time performance and simplicity (and of course, because it is allowed).

Allocating 96 bytes (arbitrary example) once takes about half as much time as allocating 48 bytes twice. And third as much times as allocating 32 bytes thrice.

Consider a loop as an extreme case:

for(int i = 0; i < 1000000; i++) {
    int j;
}

If j is allocated at the start of the function, there is one allocation. If j is allocated within the loop body, then there will be a million allocations. Less allocations is better (faster).

Note: The reason why I assumed the stack would be allocated/deallocated at the start/end of the code block is, because in C++ the constructor/destructor of objects allocated on the stack will be called at the start/end of the code block.

Now you know that you were wrong to have assumed so. As written in a fine answer in the linked question, the allocation/dealocation need not coincide with the construction/destruction.

Imagine I would use a[100000]

That is approaching a very significant fraction of total stack space. You should allocate large blocks of memory like that dynamically.

edited Dec 05 '16 at 13:54

answered Dec 05 '16 at 13:10

eerorika

232,697
12
197
326

"Less is better". I agree. So why do all compilers I have checked waste 90000 bytes of stack in the 2nd example I posted ? Just because these compilers are too dumb ? I cannot believe that. – Ingo Blackman Dec 05 '16 at 13:53
@IngoBlackman No, because it uses less instructions. – eerorika Dec 05 '16 at 13:54
So producing a stack overflow (no pun intended :-)) is better than saving a few instructions, even when compiled without optimization ? I do not believe that. – Ingo Blackman Dec 05 '16 at 13:56
@IngoBlackman Proving that stack overflow will happen and that it can be avoided in general is harder problem than you probably realize. It might be possible for your example, but why would compiler writers spend a ton of time to cater to your example when it's practically impossible to solve it for all programs? – eerorika Dec 05 '16 at 14:00
I said that for unoptimized code the same happens. Why should you hazard a stack overflow just to save a few instructions even when you already have tons of non-useful instructions because you do not optimize ? – Ingo Blackman Dec 05 '16 at 14:34

score 0 · Answer 4 · answered Dec 05 '16 at 13:34

How this is done isn't regulated by any standard. The C and C++ standards don't mention the stack at all, in theory those languages could be used even on computers that don't have a stack.

On computers that do have a stack, how this is done is specified by the ABI of the given system. Often, stack space is reserved at the point when the program enters the function. But compilers are free to optimize the code so that the stack space is only reserved when a certain variable is used.

At any rate, the point where you declare the variable has no relation to when it gets allocated. In your example, int a[16] is either allocated when the function is entered, or it is allocated just before the first place where it is used. It doesn't matter if a is declared inside the if statement or at the top of the function.

In C++ however, there is the concept of constructors. If your variable is an object with a constructor, then that constructor will get executed at the point where the variable is declared. Meaning that the variable must be allocated before that happens.

So why do clang/gcc/diab, MS visual C++ all waste 90000 bytes of stack even this is completely unnecessary. I cannot believe that the answer is that **all** of these compilers are simply too stupid... — Ingo Blackman, Dec 05 '16 at 13:51

When exactly is stack allocated

4 Answers4

Linked