In C++ block scope, is re-using stack memory the area of optimization?

Question

I tested the following codes:

void f1() {
  int x = 1;
  cout << "f1  : " << &x << endl;
}

void f2() {
  int x = 2;
  cout << "f2  : " << &x << endl;
}

void f3() {
  {
    int x = 3;
    cout << "f3_1: " << &x << endl;
  }
  {
    int x = 4;
    cout << "f3_2: " << &x << endl;
  }
}

int main() {
  f1();
  f2();
  f3();
}

in release build, the output is...

f1  : 00FAF780
f2  : 00FAF780 
f3_1: 00FAF780
f3_2: 00FAF780  <-- I expected

but in debug build,

f1  : 012FF908
f2  : 012FF908
f3_1: 012FF908
f3_2: 012FF8FC  <-- what??

I thought the rule was to move the stack pointer to use the stack memory again when the block is ended.
Is this principle the area of optimization?

[In Visual Studio C++, what are the memory allocation representations?](https://stackoverflow.com/questions/127386/in-visual-studio-c-what-are-the-memory-allocation-representations) You can view memory by the address 012FF908 when you break at `cout << "f3_2: "` — 273K, Aug 13 '20 at 05:04
Thank you, @S.M. but what I wanna know is **why** x(f3_2)'s address is **different** in debug build. — jonadarling, Aug 13 '20 at 05:12
Merely asking for the address of a local variable really ruins your compiler's day. If you didn't need that it could optimize more aggressively. In this case your `int x = 4` can be thrown out, it doesn't do anything. No address required! In other cases the compiler might use a register, but it can't do this if you need to take the address of it, so it's stuck in memory, which is a lot slower. — tadman, Aug 13 '20 at 05:25
Thank you, @tadman. I know x's value is useless and compiler will remove it. I just initialized x for code readers. And I know requiring address interrupts the compiler optimization so the program can't utilize register effectively. I want to know why x(f3_2)'s address is different from what I expected in debug build. — jonadarling, Aug 13 '20 at 06:09

score 3 · Accepted Answer · answered Aug 13 '20 at 05:27

3

A debug build is usually very unoptimized on purpose. It might be that each variable, regardless of scope, is given its own spot on the stack so that when your code crashes it can show you the state of each one. This wouldn't be possible if they all shared an address.

answered Aug 13 '20 at 05:27

tadman

208,517
23
234
262

Thank you, @tadman. Can I understand that address values are intentionally different for easy debugging? – jonadarling Aug 13 '20 at 06:14
That's likely the case. That way you can "watch" these values and the debugger can distinguish between the two variables. – tadman Aug 13 '20 at 06:21

score 1 · Answer 2 · answered Aug 13 '20 at 05:32

1

The result is dependent on the compiler that you are using.

I tried in online compiler. I got the same addresses.

I tried this online compiler.

https://www.onlinegdb.com/online_c++_compiler

answered Aug 13 '20 at 05:32

Dinesh

812
4
14

Thank you, @Dinesh. I tested the code in both vs2019 and g++. Only when I turn on optimization option, I get the expected result. The online compiler you tried gives me the result I expected in default setting. It confuses me. T_T – jonadarling Aug 13 '20 at 06:28
I think its the compiler dependent – Dinesh Aug 14 '20 at 07:06

t.niese · Answer 3 · 2020-08-13T08:32:02.883

1

I thought the rule was to move the stack pointer to use the stack memory again when the block is ended.

When it comes to optimizations you should not only think in terms of stack and heap, those are for sure an important part when doing optimizations, but the specification does not say anything about that, the specs only talk about lifetime, storage duration, and behavior. Stack and heap are just one way to implement them. So a compiler/optimizer is free to do whatever it wants as long as it fulfills the requirements of the specification.

For POD objects - that don't have any special behavior for construction or destruction - the compiler could completely optimize those objects (and their members) away and work with its value (or those of their members) directly. As @tadman already said in the comment, asking for the address, can break many of the possible optimizations. As you explicitly tell the compiler that you need to know something about the object.

It also heavily depends on the compiler, version, compiler flags, and architecture you compile for (arm, x64, haswell, sandy bridge, …) and on the surrounding code.

Because the compiler does assumptions about what generated code might perform best. E.g. allows the pipeline and branch predictor to do the best work.

If you e.g. use printf instead of std::cout the output could be:

f1  : 0x7ffc7781f56c
f2  : 0x7ffc7781f56c
f3_1: 0x7ffc7781f54c
f3_2: 0x7ffc7781f54c

Or if you place all of the code you show in one function:

void f1() {
  {
    int x = 1;
    cout << "f1  : " << &x << endl;
  }
  {
    int x = 2;
    cout << "f2  : " << &x << endl;
  }
  {
    int x = 3;
    cout << "f3_1: " << &x << endl;
  }
  {
    int x = 4;
    cout << "f3_2: " << &x << endl;
  }
}

The result for the same compiler could be:

f1  : 0x7ffc652aac34
f2  : 0x7ffc652aac34
f3_1: 0x7ffc652aac34
f3_2: 0x7ffc652aac34

So the concept of stack (visual representation), in terms of lifetime, is a way to visualize what is happening, but by no means what actually happens in terms of memory utilization on the stack (memory concept). The compiler could reserve a certain amount of memory on the stack (memory) and save the used values in it in which order it seems best. It often matches the visual representation of the stack but does not need to.

With optimizations turned off, the compiler will often use a distinct location on the stack for the variables of a function. But this is also not a guarantee. If the objects on the stack are too large the compiler might stop doing that even in a not optimized debug build:


struct Foo {
   int x1;
   int x2;
   int x3;
   int x4;
   int x5;
   int x6;
   int x7;
   int x8;
   int x9;
   int x10;
};

void f1() {
  Foo x;
  cout << "f1  : " << &x << endl;
}

void f2() {
  Foo x;
  cout << "f2  : " << &x << endl;
}

void f3() {
  {
    Foo x;
    cout << "f3_1: " << &x << endl;
  }
  {
    Foo x;
    cout << "f3_2: " << &x << endl;
  }
}

int main() {
  f1();
  f2();
  f3();
}

Would result in the same memory address for gcc x86-64 but for different addresses for clang x86-64.

edited Aug 13 '20 at 08:32

answered Aug 13 '20 at 06:19

t.niese

39,256
9
74
101

Thank you, @t.niese. After your answer, I reconfirmed my c++ book. In the book, `...In a typical implementation, automatic objects are allocated on the stack...`. So I'm sorry to use a word '_rule_'. Yes, It's an **implementation**, not a _rule_. Because it is only an implementation, it can be changed internally at any time. But that alone cannot rationally explain why it depends on the build method. Fortunately, @tadman's answer was helpful. :) – jonadarling Aug 13 '20 at 07:56
@jonadarling even for an optimized build the addresses could be different. The results you showed can actually also happen (depending on the compiler) with optimizations turned on. It heavily depends on what other code is also involved. The compiler might do inlining adding some other variables to the stack at an earlier point that are not reflected in your source code. And the compiler is also free to move code around if it has no side effects and introduces benefits, like reducing the likeliness of chachmisses or improve branch prediction and pipeline handling. – t.niese Aug 13 '20 at 08:20
@jonadarling the compilers are build to work one code construct typically exists in real-world code. Constructs like yours that don't make much sense could therefore also lead to "strange" outcomes if optimizations are turned on. – t.niese Aug 13 '20 at 08:23
I tested your code on both vs2019 and gcc. I got the same result as you on gcc only! With optimization option, different compilers gave different results. For extreme optimization of specific goal(it could be performance, code size, etc), I thought that even different compilers should produce similar results under the same conditions. If each compiler optimizes this very simple code very differently, I think it's hard for some compilers to expect effective optimization. – jonadarling Aug 13 '20 at 11:08
@jonadarling This example code of you is by now means real-world code. Do you can't draw any conclusions about whether one compiler does a better job than another based on that. Besides, that certain optimization might work well for one use-case others might work better for other use-cases. And also heavily depend on the hardware constellation, and also (no kidding) from which directory you launch your application because this influences the env variables and therefore the memory layout. So you can always expect certain differences, in individual parts of the application. – t.niese Aug 13 '20 at 11:21
Oh, I'm sorry. I'm not trying to jump to conclusions with only a few special cases. I thought that if the approach to optimization of simple code in examples, not the vast code of the real world, is different, the optimization effect of certain compilers is difficult to expect only for this small and special case. – jonadarling Aug 13 '20 at 12:14
And... I couldn't translate the other half of your comment properly. I searched for '_hardware constellation_' but didn't get an answer to guess the meaning. I think it's too much to consider environment variables and memory layouts in this simple example when all environments are the same. Again, except for the various variables of reality, I would like to discuss only the consequences of this small, simple, everything-controlled example. – jonadarling Aug 13 '20 at 12:15
@jonadarling `I thought that if the approach to optimization of simple code in examples […] is different, the optimization effect of certain compilers is difficult to expect only for this small and special case.` well with `cout << "f1 : " << &x << endl;` you explicitly prevent a huge amount of optimizations, as printing the address of an object is not a typical use-case. So with that you test an edge case of the compiler and not a typical one. – t.niese Aug 13 '20 at 12:23
@jonadarling `hardware constellation` combination of CPU, Memory, Motherboard, … . `[…]it's too much to consider environment variables and memory layouts[…]` you normally compile code so that this build binary can then be copied and run on different pcs, so this has to be considered by the compiler. – t.niese Aug 13 '20 at 12:38
@jonadarling `[…]I would like to discuss only the consequences of this small, simple, everything-controlled example.[…]` Modern CPUs execute code in a pipeline (a bunch of instructions are loaded and executed in a block, and branch prediction is performed), code is optimized with that in mind, for your shown code, there is not much a compiler code do to optimize and not much it the compiler could do wrong. – t.niese Aug 13 '20 at 12:39
Yes, I know I interrupted optimization by printing address. This example is a very special case, so I think it is possible to discuss productively only by accepting the intentional address request. I purposely printed the address of the variable. With all the same conditions, some compilers showed the same address, and another compiler showed different compilers showed different addresses. Even the same compiler showed different results depending on optimization options. I want to discuss this. – jonadarling Aug 13 '20 at 12:57
`... for your shown code, there is not much a compiler code do to optimize and not much it the compiler could do wrong.` Yes. That's exactly what I was wondering. Compilers showed significantly different results even though there was not much to optimize. I'm so sorry I seem to bother you persistently. Nevertheless, I'm getting a lot of help from your comments. Thank you. – jonadarling Aug 13 '20 at 12:57
@jonadarling compilers have different strategies on how to do optimizations and how to prepare them. Based on those strategies they start to prepare/optimize the AST and the layout on the stack. If there is not much to do like in your example they can't do any further/meaningful optimization a stick with this initial setup which obviously then has some differences to other compilers or settings. It neither has any significant downsides nor benefits when compared to other complex code you normally have in the application, so developers of the compilers don't need to bother about that. – t.niese Aug 13 '20 at 13:20
I think there must be reasons and methods for all optimization techniques or their strategies. Meaningless optimization may or should not exist. What I can now observe is that depending on the optimization options for each compiler, there is clearly a _consistent change_ in layout on the stack memory. There will be a valid reason for the compiler to compile so consistently. – jonadarling Aug 13 '20 at 21:13

In C++ block scope, is re-using stack memory the area of optimization?

3 Answers3

Linked