Inline and stack frame control

Question

The following are artificial examples. Clearly compiler optimizations will dramatically change the final outcome. However, and I cannot stress this more: by temporarily disabling optimizations, I intend to have an upper bound on stack usage, likely, I expect that further compiler optimization can improve the situation.

The discussion in centered around GCC only. I would like to have fine control over how automatic variables get released from the stack. Scoping with blocks does not ensure that memory will be released when automatic variables go out of scope. Functions, as far as I know, do ensure that.

However, when inlining, what is the case? For example:

inline __attribute__((always_inline)) void foo()
{
    uint8_t buffer1[100];
    // Stack Size Measurement A
    // Do something 
}

void bar()
{
    foo();
    uint8_t buffer2[100];
    // Stack Size Measurement B
    // Do something else
}

Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?

Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?

Stack memory do not really get de-allocated, the stack base pointer register get altered to accommodate the function stack usage. See [this example](https://godbolt.org/g/iSHXJi) — dvhh, May 22 '18 at 07:35
What your debugger says? Use it and you will know the answer. BTw Who did you tell that the exit from the scope does not release the memory. Just use your debugger and see how the stack pointer changes on the entry & the exit form the scope — 0___________, May 22 '18 at 07:35
Also with modern C compiler you can declare your local variable anywhere, but they are reserved at the start of the function, and not as dynamically as expected. — dvhh, May 22 '18 at 07:36
@dvhh I will rephrase, I know that the pointer will get adjusted. Nevertheless. I wrote "released" meaning that stack pointer will change. I never discussed memory allocation (as in the dynamic case). — Juan Leni, May 22 '18 at 07:40
@PeterJ_01 there are many discussions about that. e.g. https://stackoverflow.com/questions/2759371/in-c-do-braces-act-as-a-stack-frame It really depends on the compiler implementation. — Juan Leni, May 22 '18 at 07:42
It's not *guaranteed* that's for sure, although I would presume it would be the case, at least when optimizations are enabled. You need to check how your actual code compiles. And note that you don't even need an inline function, just adding a separate block inside the function will create an inner scope for that variable. So if the second buffer is declared after the inner scope, the compiler is free to reuse the space, and will probably do so. — vgru, May 22 '18 at 07:57
You will never have any guarantees of stack allocation while writing in C, optimizer or no optimizer. The only way to get full control over this is to declare and push/pop the variables in assembler, then have the C code reference those variables (through `extern` or similar). — Lundin, May 22 '18 at 09:53
An optimizing compiler would, if (assume you actually *use* that space somewhere) it were optimizing well, **not** release `buffer1`, but rather *reuse* it for `buffer2`, which is handy, as they both require the same amount of space. — tofro, May 22 '18 at 21:50

Basile Starynkevitch · Accepted Answer · 2018-05-22T08:51:00.190

I would like to have fine control over how automatic variables get released from the stack.

Lots of confusion here. The optimizing compiler could store some automatic variables only in registers, without using any slot in the call frame. The C language specification (n1570) does not require any call stack.

And a given register, or slot in the call frame, can be reused for different purposes (e.g. different automatic variables in different parts of the function). Register allocation is a significant role of compilers.

Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?

Certainly not. The compiler could prove that at some later point in your code, the space for buffer1 is not useful anymore so reuse that space for other purposes.

is there any way I can have fine control over stack deallocations?

No, there is not. The call stack is an implementation detail, and might not be used (or be "abused" in your point of view) by the compiler and the generated code.

For some silly example, if buffer1 is not used in foo, the compiler might not allocate space for it. And some clever compilers might just allocate 8 bytes in it, if they can prove that only 8 first bytes of buffer1 are useful.

More seriously, in some cases, GCC is able to do tail-call optimizations.

You should be interested in invoking GCC with -fstack-reuse=all, -Os, -Wstack-usage=256, -fstack-usage, and other options.

Of course, the concrete stack usage depends upon the optimization levels. You might also inspect the generated assembler code, e.g. with -S -O2 -fverbose-asm

For example, the following code e.c:

int f(int x, int y) {
    int t[100];
    t[0] = x;
    t[1] = y;
    return t[0]+t[1];
}

when compiled with GCC8.1 on Linux/Debian/x86-64 using gcc -S -fverbose-asm -O2 e.c gives in e.s

        .text
        .p2align 4,,15
        .globl  f
        .type   f, @function
f:
.LFB0:
        .cfi_startproc
# e.c:5:      return t[0]+t[1];
        leal    (%rdi,%rsi), %eax       #, tmp90
# e.c:6: }
        ret     
        .cfi_endproc
.LFE0:
        .size   f, .-f

and you see that the stack frame is not grown by 100*4 bytes. And this is still the case with:

int f(int x, int y, int n) {
    int t[n];
    t[0] = x;
    t[1] = y;
    return t[0]+t[1];
}

which actually generates the same machine code as above. And if instead of the + above I'm calling some inline int add(int u, int v) { return u+v; } the generated code is not changing.

Be aware of the as-if rule, and of the tricky notion of undefined behavior (if n was 1 above, it is UB).

I dont think there is any confusion here. It is clear that optimizations are applicable. But overall, stack will be used and I want to have better control. Particularly over VLAs. — Juan Leni, May 22 '18 at 08:11
The compiler could prove that a given VLA is useless, and never allocate space for it — Basile Starynkevitch, May 22 '18 at 08:12
I cannot just rely on optimizations on a tight embedded system with limited memory. Anyway, just saying "the compiler will take care of it" is not providing much information. — Juan Leni, May 22 '18 at 08:14
I dont see why you are talking about "never allocating space" for VLAs. I need some VLAs to go out of scope. Anyway, thanks for your contribution. — Juan Leni, May 22 '18 at 08:17
But indeed, the compiler will take of it, as my improved answer shows. And you could have surprises. What you need to use is something like `-Wstack-usage=256` (perhaps with `-Werrors`) and `-fstack-usage` — Basile Starynkevitch, May 22 '18 at 08:42

score 4 · Answer 2 · answered May 22 '18 at 08:04

Can I always expect that at measurement B, the stack will only containbuffer2 and buffer1 has been released?

No. It's going to depend on GCC version, target, optimization level, options.

Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?

Your requirement is so specific I guess you will likely have to write yourself the code in assembler.

gzh · Answer 3 · 2018-05-22T08:42:17.960

1

mov BYTE PTR [rbp-20], 1 and mov BYTE PTR [rbp-10], 2 only show the relative offset of stack pointer in stack frame. when considering run-time situation, they have the same peak stack usage.

There are two differences about whether using inline: 1) In function call mode, buffer1 will be released when exit from foo(). But in inline method, buffer1 will not be kept until exit from bar(), that means peak stack usage will last a longer time. 2) Function call will add a few overhead, such as saving stack frame information, comparing with inline mode

edited May 22 '18 at 08:42

answered May 22 '18 at 08:06

gzh

3,507
2
19
23

In this case, the peak stack is at least 20. In the other case was at least 10. – Juan Leni May 22 '18 at 08:12
@purpletentacle, In the peak stack usage of function call mode, the buffer1 and buffer2 will all be kept in stack, but in different stack frame, while in inline mode, buffer1 and buffer2 will be kept in the same stack frame. so they have the same peak stack usage in run-time. – gzh May 22 '18 at 08:29
Correct, however, at point B, without inlining, buffer1 should be gone. And any subsequent call will only have 10 and not 20 there. – Juan Leni May 22 '18 at 08:31
1

@purpletentacle, If you want to have more stack space in the subsequent processing after point B, function call mode is better than inline mode. – gzh May 22 '18 at 08:39

Inline and stack frame control

3 Answers3