The assembly of “b++”

Question

In C language,what's the assemble of "b++". I got two situations:

1) one instruction

     addl    $0x1,-4(%rbp)

2) three instructions

        movl    -4(%rbp), %eax
        leal    1(%rax), %edx
        movl    %edx, -4(%rbp)

Are these two situations caused by the compiler?

my code:

int main()
{
    int ret = 0;
    int i = 2;

    ret = i++;
    ret = ++i;
    return ret;
}

the .s file(++i use addl instrction, i++ use other)：

        .file   "main.c"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $0, -8(%rbp)   //ret
        movl    $2, -4(%rbp)   //i
        movl    -4(%rbp), %eax
        leal    1(%rax), %edx
        movl    %edx, -4(%rbp)
        movl    %eax, -8(%rbp)
        addl    $1, -4(%rbp)
        movl    -4(%rbp), %eax
        movl    %eax, -8(%rbp)
        movl    -8(%rbp), %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413"
        .section        .note.GNU-stack,"",@progbits

The C and C++ language specifications make no mention at all of assemler, so there are no guarantees. Different compilers can do different things in many different cases, as long as the observed behaviour matches The Standard. So you would have to explore what happens in any specific case you are interested in. (https://godbolt.org/ is a very interesting tool to help you do that.) — BoBTFish, Jul 18 '18 at 11:23
For completeness, can you describe in the question how the assembly was generated (which compiler, which flags?) — Olivier Sohn, Jul 18 '18 at 11:24
Try https://godbolt.org/ to see all the different things that can happen under the hood on different compilers — doctorlove, Jul 18 '18 at 11:24
As always, these answers are highly dependent on the compiler version and optimization settings. I'm afraid there's no way this question can really be answered other than "yep, those are both valid - compilers are funny things". — Jonathon Reinhart, Jul 18 '18 at 11:25
Just so you know, the two options you posted are not equivalent. — Ajay Brahmakshatriya, Jul 18 '18 at 11:28
The second version is wrong. It correctly increases the value stored in `[rbp-4]`, but then it stores the address of `[rbp-4]` (truncated to 32 bits) into `[rbp-4]` which doesn't make sense. — interjay, Jul 18 '18 at 11:47
the listed `.s` file doesn't contain what you told us in 2). — geza, Jul 19 '18 at 06:41
I hope that you know that you've completely misled us by showing a different code in 2) previously. How have you came up with the original 2)? — geza, Jul 19 '18 at 07:37

score 2 · Answer 1 · answered Jul 18 '18 at 11:37

The ISO standard does not mandate at all what happens under the covers. It specifies a "virtual machine" that acts in a certain way given the C instructions you provide to it.

So, if your C compiler is implemented as a C-to-Dartmouth-Basic converter, b++ is just as likely to lead to 10 let b = b + 1 as anything else :-)

If you're compiling to common assembler code, then you're likely to see a difference depending on whether you use the result, specifically b++; as opposed to a = b++ since the result of the former can be safely thrown away.

You're also likely to see massive differences based on optimisation level.

Bottom line, short of specifying all the things that can affect the output (including but not limited to compiler, target platform, and optimisation levels).

Peter Cordes · Answer 2 · 2018-07-19T13:16:48.737

The first one is the output for ++i as part of ret = ++i. It doesn't need to keep the old value around, because it's doing ++i and then res=i. Incrementing in memory and then reloading that is a really stupid and inefficient way to compile that, but you compiled with optimization disabled so gcc isn't even trying to make good asm output.

The 2nd one is the output for i++ as part of ret = i++. It needs to keep the old value of i around, so it loads into a register and uses lea to calculate i+1 in a different register. It could have just stored to ret and then incremented the register before storing back to i, but I guess with optimizations disabled gcc doesn't notice that.

Previous answer to the previous vague question without source, and with bogus code:

The asm for a tiny expression like b++ totally depends on the surrounding code in the rest of the function (or with optimization disabled, at least the rest of the statement) and whether it's a global or local, and whether it's declared volatile.

And of course compiler optimization options have a massive impact; with optimization disabled, gcc makes a separate block of asm for every C statement so you can use the GDB jump command to go to a different source line and have the code still produce the same behaviour you'd expect from the C abstract machine. Obviously this highly constrains code-gen: nothing is kept in registers across statements. This is good for source-level debugging, but sucks to read by hand because of all the noise of store/reload.

For the choice of inc vs. add, see INC instruction vs ADD 1: Does it matter? clang -O3 -mtune=bdver2 uses inc for memory-destination increments, but with generic tuning or any Intel P6 or Sandybridge-family CPU it uses add $1, (mem) for better micro-fusion.

See How to remove "noise" from GCC/clang assembly output?, especially the link to Matt Godbolt's CppCon2017 talk about looking at and making sense of compiler asm output.

The 2nd version in your original question looks like mostly un-optimized compiler output for this weird source:

 // inside some function
 int b;

                   // leaq  -4(%rbp), %rax   // rax = &b
 b++;              // incl   (%rax)
 b = (int)&b;      // mov    %eax, -4(%rbp)

(The question has since been edited to different code; looks like the original was mis-typed by hand mixing an opcode from once line with an operand from another line. I reproduce it here so all the comments about it being weird still make sense. For the updated code, see the first half of my answer: it depends on surrounding code and having optimization disabled. Using res = b++ needs the old value of b, not the incremented value, hence different asm.)

If that's not what your source does, then you must have left out some intervening instructions or something. Or else the compiler is re-using that stack slot for something else.

I'm curious what compiler you got that from, because gcc and clang typically don't like to use results they just computed. I'd have expected incl -4(%rbp).

Also that doesn't explain mov %eax, -4(%rbp). The compiler already used the address in %rax for inc, so why would a compiler revert to a 1-byte-longer RBP-relative addressing mode instead of mov %eax, (%rax)? Referencing fewer different registers that haven't been recently written is a good thing for Intel P6-family CPUs (up to Nehalem), to reduce register-read stalls. (Otherwise irrelevant.)

Using RBP as a frame pointer (and doing increments in memory instead of keeping simple variables in registers) looks like un-optimized code. But it can't be from gcc -O0, because it computes the address before the increment, and those have to be from two separate C statements.

b++ = &b; isn't valid because b++ isn't an lvalue. Well actually the comma operator lets you do b++, b = &b; in one statement, but gcc -O0 still evaluates it in order, rather than computing the address early.

Of course with optimization enabled, b would have to be volatile to explain incrementing in memory right before overwriting it.

clang is similar, but actually does compute that address early. For b++; b = &b;, notice that clang6.0 -O0 does an LEA and keeps RAX around across the increment. I guess clang's code-gen doesn't support consistent debugging with GDB's jump the way gcc does.

    leaq    -4(%rbp), %rax
    movl    -4(%rbp), %ecx
    addl    $1, %ecx
    movl    %ecx, -4(%rbp)
    movl    %eax, %ecx          # copy the LEA result
    movl    %ecx, -4(%rbp)

I wasn't able to get gcc or clang to emit the sequence of instructions you show in the question with unoptimized or optimized + volatile, on the Godbolt compiler explorer. I didn't try ICC or MSVC, though. (Although unless that's disassembly, it can't be MSVC because it doesn't have an option to emit AT&T syntax.)

I have add comment to my question, please take another look. — Jams.Liu, Jul 19 '18 at 03:01
@Jams.Liu: there's no `leaq -4(%rbp), %rax` in the actual asm output from gcc5.3 -O0 that you showed. It's only using LEA to copy-and-increment a register to implement `ret = b++` (where it needs to keep the value before increment), and then only because you disabled optimization so the code is bloated and terrible. Looking at `-O0` asm is a good way to learn the *wrong* / slow way to do things in asm. It's not intended to run fast. — Peter Cordes, Jul 19 '18 at 03:16
@Jams.Liu: your updated question finally makes sense. I added a new section at the top of my answer that answers it. — Peter Cordes, Jul 19 '18 at 13:17

score 1 · Answer 3 · answered Jul 18 '18 at 11:26

1

Any good compiler will optimise b++ to ++b if the result of the expression is discarded. You see this particularly in increments in for loops.

That's what is happening in your "one instruction" case.

answered Jul 18 '18 at 11:26

Bathsheba

231,907
34
361
483

score 1 · Answer 4 · answered Jul 19 '18 at 05:56

It's not typically instructive to look at un-optimized compiler output, since values (variables) will usually be updated using a load-modify-store paradigm. This might be useful initially when getting to grips with assembly, but it's not the output to expect from an optimizing compiler that maintains values, pointers, etc., in registers for frequent use. (see: locality of reference)

/* un-optimized logic: */

int i = 2;
ret = i++; /* assign ret <- i, and post-increment i (ret = i; i++ (i = 3)) */
ret = ++i; /* pre-increment i, and assign ret <- i  (++i (i = 4); ret = i) */

i.e., any modern, optimising compiler can easily determine that the final value of ret is (4).

Removing all the extraneous directives, etc., gcc-7.3.0 on OS X gives me:

_main:  /* Darwin x86-64 ABI adds leading underscores to symbols... */
        movl    $4, %eax
        ret

Apple's native clang, and the MacPorts clang-6.0 set up basic stack frame, but still optimise the ret arithmetic away:

_main:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    $4, %eax
        popq    %rbp
        retq

Note that the Mach-O (OS X) ABI is very similar to the ELF ABI for user-space code. Just try compiling with at least -O2 to get a feel for 'real' (production) code.

The assembly of “b++”

4 Answers4