I would like to have fine control over how automatic variables get released from the stack.
Lots of confusion here. The optimizing compiler could store some automatic variables only in registers, without using any slot in the call frame. The C language specification (n1570) does not require any call stack.
And a given register, or slot in the call frame, can be reused for different purposes (e.g. different automatic variables in different parts of the function). Register allocation is a significant role of compilers.
Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?
Certainly not. The compiler could prove that at some later point in your code, the space for buffer1
is not useful anymore so reuse that space for other purposes.
is there any way I can have fine control over stack deallocations?
No, there is not. The call stack is an implementation detail, and might not be used (or be "abused" in your point of view) by the compiler and the generated code.
For some silly example, if buffer1
is not used in foo
, the compiler might not allocate space for it. And some clever compilers might just allocate 8 bytes in it, if they can prove that only 8 first bytes of buffer1
are useful.
More seriously, in some cases, GCC is able to do tail-call optimizations.
You should be interested in invoking GCC with -fstack-reuse=all
, -Os
,
-Wstack-usage=256
, -fstack-usage
, and other options.
Of course, the concrete stack usage depends upon the optimization levels. You might also inspect the generated assembler code, e.g. with -S -O2 -fverbose-asm
For example, the following code e.c
:
int f(int x, int y) {
int t[100];
t[0] = x;
t[1] = y;
return t[0]+t[1];
}
when compiled with GCC8.1 on Linux/Debian/x86-64 using gcc -S -fverbose-asm -O2 e.c
gives in e.s
.text
.p2align 4,,15
.globl f
.type f, @function
f:
.LFB0:
.cfi_startproc
# e.c:5: return t[0]+t[1];
leal (%rdi,%rsi), %eax #, tmp90
# e.c:6: }
ret
.cfi_endproc
.LFE0:
.size f, .-f
and you see that the stack frame is not grown by 100*4 bytes. And this is still the case with:
int f(int x, int y, int n) {
int t[n];
t[0] = x;
t[1] = y;
return t[0]+t[1];
}
which actually generates the same machine code as above. And if instead of the +
above I'm calling some inline int add(int u, int v) { return u+v; }
the generated code is not changing.
Be aware of the as-if rule, and of the tricky notion of undefined behavior (if n
was 1 above, it is UB).