Why does the expression `10 + 32` not use the stack while `10 + -32` does?

Question

So I'm self-studying a compiler's textbook and I'm getting the hang of it but I have a question regarding compilation of simple high level expressions, such as 52 + -10, to x86-64 (AT&T).

Consider the following expressions:

10 + 32 generates the following assembly:

       .globl main
main: 
       movq $10, %rax
       addq $32, %rax
       retq

but this next expression 52 + -10 generates the following x86-64 assembly:

      .globl main
main:
      pushq %rbp
      movq %rsp, %rbp 
      subq $16, %rsp
      movq $10, -8(%rbp) 
      negq -8(%rbp)
      movq -8(%rbp), %rax 
      addq $52, %rax
      addq $16, %rsp
      popq %rbp
      retq

My understanding is the following. To compile a high level expression such as 52 + -10 you need to remove complex operands so you need to make intermediate variables to compile, such as:

tmp_0 = -10
52 + temp_0

So I am guessing that the difference lies in the fact that there's intermediate variables involved. In the book in relation to this example (ie 52 + -10) it says:

We exhibit the use of memory for storing intermediate results in the next example. Figure 2.7 lists an x86 program that computes 52 + -10. This program uses a region of memory called the procedure call stack (stack for short).

So I'm wondering why the expression 52 + -10 uses the stack (due to the intermediate variable) and why 10 + 32 doesn't. The stack, a region of memory is used to store the intermediate variables but I want to know WHY.

Thanks.

There's zero reason to emit any assembly other than `mov $10 + -32, %eax` ; `ret`. Anything more is a missed optimization. Using stack space is a worse missed optimization than just doing the `+` on two constants with an `add`-immediate. `add $-32, %rax` is encodeable, a slightly less dumb compiler would use it. — Peter Cordes, Nov 12 '22 at 03:49
Or yeah, like Alexander suggests, it's probably like C where `-32` isn't a single constant, it's the unary operator `-` applied to the constant `32`. (Fun fact, that's why `INT_MIN` isn't defined as `-2147483648`, that would have type `long long`.) And this dumb compiler can't negate in a register as the first part of a larger expression. There's no reason to expect it to use stack space; since it's already working on the right hand operand of `+` first, unlike in the earlier example, `mov $10, %eax` / `neg %rax` / `add $52, %rax` is what you could reasonably expect. — Peter Cordes, Nov 12 '22 at 03:52
The quote you are giving just says that this is _an_ program performing the calculation. It doesn't seem to indicate to me that there aren't other, more efficient, programs doing it or that this is the output of an actual compiler. This translation might just be chosen so that the author can introduce the stack concept. — user17732522, Nov 12 '22 at 03:53

Peter Cordes · Answer 1 · 2022-11-12T04:45:01.267

Real-world compilers aren't that dumb, even when you tell them not to optimize (like clang -O0). They evaluate constant expressions at compile time to a single integer, because that's easier than carrying around the logic of all those operators throughout the work of transforming the program into assembly or machine code.

For example, even MSVC (Godbolt) compiles return 52 + -10 to mov eax,42/ret, and that's a compiler that in debug builds will sometimes do insane things like compiling if(true);else to materializing a 1 in a register and comparing or testing it with itself. Instead of optimizing away the else side entirely like some other compilers, or at least using an unconditional jmp.

Compile-time eval is sometimes required in languages like C, for example static int arr[10 - 1]; is legal, and the size of a static array has to be a compile-time constant. Since the compiler needs to be able to do that, it makes sense to just always do it when simple, even without optimization enabled.

Part of the goal of gcc -O0 is to compile fast (not most simply / literally / naively), without caring about efficiency of the generated code. It's still faster to eval that integer expression soon after parsing than to carry it around and later generate machine code for it.

But if you have a truly naive compiler that chooses to be that inefficient:

Like Alexander commented, it's probably like C where -32 isn't a single constant, it's the unary operator - applied to the constant 32. (Fun fact, that's why INT_MIN isn't defined as -2147483648, that would have type long long.) It chooses not to do constant-propagation at compile-time to get a negative integer constant.

And this dumb compiler can't negate in a register as the first part of a larger expression. There's no reason to expect it to use stack space; since it's already working on the right hand operand of + first, unlike in the earlier example, mov $10, %eax ; neg %rax ; add $52, %rax is what you could reasonably expect.

You quote some material from the book explaining that it works by inventing a temporary to hold the unary - result, which makes sense internally.

But then it treats that temporary like a variable that existed in the source and needs to have a memory address, like real-world C compilers do at -O0 for variables not declared register. (Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?) (register doesn't do anything for efficiency except at -O0, that's why ISO C++17 removed it from the language.)

Since it's a local var, it's stored on the stack. And it sets up RBP as a frame pointer when anything uses the stack, and reserves stack space so it doesn't assume the x86-64 System V ABI's red-zone is available (128 bytes below RSP that are safe from async clobbers), even though the lifetime of this variable is contained to the expression.

I guess a more complex expression could include function calls, like -10 + foo(3), so that makes sense if it's aiming for maximum simplicity in this part of the compiler, at the expense of leaving more work to do for other also-simple parts of the compiler. i.e. not looking for the optimization of keeping the temporary in a register when there aren't function calls. Having a compiler this simple means the internal data structures and the asm generated will be larger (and less efficient) for the same program.

score 1 · Answer 2 · edited Nov 12 '22 at 19:15

1

So, what I was missing is that the stack is the region of memory for storing local variables. For each function call more stack space is allocated. In my example this allocation corresponds to subq $16, %rsp. We need to allocate one variable so that's 8 bytes but this number has to be divisible by 16, hence the 16.

Take a look here to understand the basics :-)

edited Nov 12 '22 at 19:15

Sep Roland

33,889
7
43
76

answered Nov 12 '22 at 04:06

chez93

131
6

Why does the expression `10 + 32` not use the stack while `10 + -32` does?

2 Answers2

But if you have a truly naive compiler that chooses to be that inefficient:

Linked