Assembly: Why there is an empty memory on stack？

Question

I use online complier wrote a simple c++ code :

int main()
{
    int a = 4;
    int&& b = 2;
}

and the main function part of assembly code complied by gcc 11.20 shown below

main:

push    rbp
mov     rbp, rsp
mov     DWORD PTR [rbp-4], 4
mov     eax, 2
mov     DWORD PTR [rbp-20], eax
lea     rax, [rbp-20]
mov     QWORD PTR [rbp-16], rax
mov     eax, 0
pop     rbp
ret

I notice that when initializing 'a', the instruction just simply move an immediate operand directly to memory while for r-value reference 'b', it first store the immediate value into register eax,then move it to the memory, and also there is an unused memory bettween [rbp-8] ~ [rbp-4], I think that whatever immediate value,they just exist, so it has to be somewhere or it just simply use signal to iniltialize(my guess), I want to know more about the underlying logic.

So my question is that:

Why does inilization differs?
Why there is an empty 4-bytes unused memory on stack?

once you turn on optimizations the assemby will look quite different. I would not interpret too much into assembly resulting from a debug build — 463035818_is_not_an_ai, Mar 31 '22 at 13:41
It looks like you are inspecting unoptimized assembly. It is normal that you will see sub-optimal results which are not really representative of what a final application will do. C++ is meant to be compiled with optimizations on. With optimizations, the entire function is greatly improved to just returning 0 : https://godbolt.org/z/Y9z46e8hj — François Andrieux, Mar 31 '22 at 13:41
Initialization is a C++ concept. C++ doesn't have registers. Registers exist only as an implementation of a C++ program. They don't belong together, question 2 is a categorical mistake. — Passer By, Mar 31 '22 at 13:43
Yes the programs will be optimized every time run as release model but what I want is to see is the underlying process in machine level because we write code intuitively with a logic in our head, if I write a program containing a lot of functions call it one by one then bulid with optimization it will end up returning some value and omit the implimentations , it's impossiable to figure out what the program tend to achieve, so it only make sence for leaning in debug model. — Used To Love, Mar 31 '22 at 14:23
Are you sure that `mov eax, 3` corresponds to the posted C++ code, ` int&& b = 2;` ? I would expect to see something like `int&& b = 3;`. — zkoza, Mar 31 '22 at 14:27

score 4 · Accepted Answer · answered Mar 31 '22 at 15:08

Let me address the second question first.

Note that there are actually three objects defined in this function: the int variable a, the reference b (implemented as a pointer), and the unnamed temporary int with a value of 2 that b points to. In unoptimized compilation, each of these objects needs to be stored at some unique location on the stack, and the compiler allocates stack space naively, processing the variables one by one and assigning each one space below the previous. It evidently chooses to handle them in the following order:

The variable a, an int needing 4 bytes. It goes in the first available stack slot, at [rbp-4].
The reference b, stored as a pointer needing 8 bytes. You might think it would go at [rbp-12], but the x86-64 ABI requires that pointers be naturally aligned on 8-byte boundaries. So the compiler moves down another 4 bytes to achieve this alignment, putting b at [rbp-16]. The 4 bytes at [rbp-8] are unused so far.
The temporary int, also needing 4 bytes. The compiler puts it right below the previously placed variable, at [rbp-20]. True, there was space at [rbp-8] that could have been used instead, which would be more efficient; but since you told the compiler not to optimize, it doesn't perform this optimization. It would if you used one of the -O flags.

As to why a is initialized with an immediate store to memory, whereas the temporary is initialized via a register: to really answer this, you'd have to read the details of the GCC source code, and frankly I don't think you'll find that there is anything very interesting behind it. Presumably there are different code paths in the compiler for creating and initializing named variables versus temporaries, and the code for temporaries may happen to be written as two steps.

It may be that for convenience, the programmer chose to create an extra object in the intermediate representation (GIMPLE or RTL), perhaps because it simplifies the compiler code in handling more general cases. They wouldn't take any trouble to avoid this, because they know that later optimization passes will clean it up. But if you have optimization turned off, this doesn't happen and you get actual instructions emitted for this unnecessary transfer.

zkoza · Answer 2 · 2022-03-31T22:06:22.813

1

In

 int a = 4;

you declare a (typically) 4-byte variable and ask the compiler to fill it with the bit representation of 4. In

int&& b = 2;

you declare a reference ("r-value reference") to, well, to what? To a literal? Is it possible? In C++ references are typically translated, on the assembly level, into pointers. So one can expect that b will be "a pointer in disguise", that is, without the * and -> semantics. But it will likely occupy 64 bits on a 64-bit machine. Now, pointers must point to some memory stored in RAM, not in registers, cache(s) etc. So the compiler most likely creates a temporary (unnamed) integer, initializes it with 2, and then binds its address to b. I write "most likely" because I doubt the standard standardizes this in such great detail. What we know for sure is that there is an extra unnamed variable involved in the initialization of b in int&& b = 2;.

As for the assembler, I have too little knowledge of it to dare explain anything to you. I guess, however, that the concept of a temporary variable and a pointer behind the && reference solves all your problems here.

edited Mar 31 '22 at 22:06

answered Mar 31 '22 at 14:24

zkoza

2,644
3
16
24

thanks! extactly I was watching his videos recent days and begin to learn assembly language, it's an nice journey digging into assembly surface of code. – Used To Love Mar 31 '22 at 15:39
sure! as soon as I reached 15 reputations – Used To Love Mar 31 '22 at 15:55
Yes, GCC for x86-64 only knows about two ABIs, x86-64 System V, and Windows x64. They both have 4-byte `int`. And (except for the x32 variant of x86-64 SysV) they both have 8-byte pointers. – Peter Cordes Mar 31 '22 at 19:58
I think you copied the asm twice from your equivalent that takes a pointer to a named var; that does compile the way you show, with a `mov mem, 2` instead of the question's `mov eax, 2` / `mov mem, eax` to materialize the anonymous `2` int. https://godbolt.org/z/cKjssd1de shows G++11.2 matches the question's asm for the question's source, and matches your asm for `const int* const p = &tmp;`. They're equivalent but not identical asm, because this is a debug build so the compiler didn't try much to optimize. (https://godbolt.org/z/d6368T7T7 shows -O3 passing a pointer to a non-inline func.) – Peter Cordes Mar 31 '22 at 20:00
@PeterCordes Yes, I don't know how could it happen: I must have copied and pasted the same code twice. I'll edit the answer. – zkoza Mar 31 '22 at 22:00
1

The standard specifies that `int&& b = 2;` creates a _temporary object_ (not a variable) of type `int` initialized to `2` living as long as `b` lives and that `b` is bound to this reference. Of course it does not specify that references shall be implemented as pointers or that objects or references need to occupy stack memory. – user17732522 Mar 31 '22 at 22:15
@user17732522: Right, that's why we have to compile in debug-mode to get asm like this. That introduces the [extra behaviour of debug-mode](https://stackoverflow.com/questions/53366394/why-does-clang-produce-inefficient-asm-with-o0-for-this-simple-floating-point) gcc/clang/msvc/icc/etc. where every C++ object that has an address lives in memory and is in sync between statements, not optimized away or into registers. (Also applies to references as an implementation detail, even though you can't take the address of a reference, they do take space in a struct.) – Peter Cordes Apr 01 '22 at 00:34
And in practice references *are* implemented like pointers by mainstream compilers. Anyway, with the alternate C/C++ version using a pointer that compiles to equivalent and almost-same asm removed, this answer just has a similar explanation to Nate's but with less precision (e.g. temporary *variable* instead of anonymous *object*.) And doesn't explain the gap that GCC could have closed in the red-zone while maintaining `alignof(T)` for each object by putting them in a different order. – Peter Cordes Apr 01 '22 at 00:41
So unfortunately I felt I should remove my upvote. I liked that the pointer version showed that GCC used the same stack layout when you aren't using rvalue references, which was IMO the most useful part of this answer. – Peter Cordes Apr 01 '22 at 00:41

Assembly: Why there is an empty memory on stack？

2 Answers2