0

I have this code in C:

int main(void)
{
    int a = 1 + 2;
    return 0;
}

When I objdump -x86-asm-syntax=intel -d a.out which is compiled with -O0 flag with GCC 9.3.0_1, I get:

0000000100000f9e _main:
100000f9e: 55                           push    rbp
100000f9f: 48 89 e5                     mov rbp, rsp
100000fa2: c7 45 fc 03 00 00 00         mov dword ptr [rbp - 4], 3
100000fa9: b8 00 00 00 00               mov eax, 0
100000fae: 5d                           pop rbp
100000faf: c3                           ret

and with -O1 flag:

0000000100000fc2 _main:
100000fc2: b8 00 00 00 00               mov eax, 0
100000fc7: c3                           ret

which removes the unused variable a and stack managing altogether.

However, when I use Apple clang version 11.0.3 with -O0 and -O1, I get

0000000100000fa0 _main:
100000fa0: 55                           push    rbp
100000fa1: 48 89 e5                     mov rbp, rsp
100000fa4: 31 c0                        xor eax, eax
100000fa6: c7 45 fc 00 00 00 00         mov dword ptr [rbp - 4], 0
100000fad: c7 45 f8 03 00 00 00         mov dword ptr [rbp - 8], 3
100000fb4: 5d                           pop rbp
100000fb5: c3                           ret

and

0000000100000fb0 _main:
100000fb0: 55                           push    rbp
100000fb1: 48 89 e5                     mov rbp, rsp
100000fb4: 31 c0                        xor eax, eax
100000fb6: 5d                           pop rbp
100000fb7: c3                           ret

respectively. I never get the stack managing part stripped off as in GCC. Why does (Apple) Clang keep unnecessary push and pop?


This may or may not be a separate question, but with the following code:

int main(void)
{
    // return 0;
}

GCC creates a same ASM with or without the return 0;. However, Clang -O0 leaves this extra

100000fa6: c7 45 fc 00 00 00 00         mov dword ptr [rbp - 4], 0

when there is return 0;.

Why does Clang keep these (probably) redundant ASM codes?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jay Lee
  • 1,684
  • 1
  • 15
  • 27
  • 6
    The `-O` settings are just abbreviations for a set of particular optimizations. One of them for gcc is `-fomit-frame-pointer`. Apparently it's not enabled in clang `-O1` setting. You can try adding that by hand. Then again, maybe apple calling convention mandates a frame pointer. – Jester May 15 '20 at 23:20
  • @Jester You are absolutely right! I got the frame pointer removed with the `-fomit-frame-pointer` flag with clang. However, I still have different ASM with and without `return 0` in clang: the additional `mov dword ptr [rsp - 4], 0`... I thought `return 0` was implicit if I don't write it out, but it seems that clang treats codes with/without the return statement differently. – Jay Lee May 15 '20 at 23:28
  • 6
    You can't really complain about optimizations made unless you ask for the beans like `-O3`. The others are just "less optimized" but have no specific standard definition. – tadman May 15 '20 at 23:37
  • 2
    Turn on LTO/PGO and let the compiler completely remove the code for you by inlining it or hiding it away in a cold section. Then you don't need to worry about a push and pop and it can save the return and call as a bonus. :) – Michael Dorgan May 15 '20 at 23:40
  • 1
    The return 0 store only happens at `-O0`, right? You told the compiler to optimize as little as possible (compile fast not well) so you're seeing internal implementation details. [Unoptimized clang++ code generates unneeded "movl $0, -4(%rbp)" in a trivial main()](https://stackoverflow.com/q/49300094) / [Why is 0 moved to stack when using return value?](https://stackoverflow.com/q/31149806) – Peter Cordes May 16 '20 at 00:08
  • 1
    @old_timer: In C++ and C99, `main` has an implicit `return 0;` at the bottom. This would be UB for any other function name, and clang `-O0` does notice that and emit `ud2` (an illegal instruction) instead of an epilogue / `ret` in C++ mode, or a warning in C mode. https://godbolt.org/z/9LDjGo. clang defaults to C99 or C11 in C mode, not C89 where main isn't special that way. That's also why `//` comments are allowed. – Peter Cordes May 16 '20 at 01:00
  • @old_timer Thank you for the suggestion! :) I'm just starting to learn assembly, so I guess I'll have to tinker with more C codes and learn from textbooks! – Jay Lee May 16 '20 at 01:12
  • 1
    @old_timer: Agreed, I don't recommend looking at `main`; it's special in various ways. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). And yes, gcc and clang assume users will use at least `-O2` for "release" builds. `-O3` is more conservative than it used to be (gcc -O3 doesn't include `-funroll-loops`), and/or auto-vectorization is more important than it used to be on modern CPUs with good SIMD. Another huge gcc/clang difference is that `clang -O2` enables auto-vectorization, but gcc only does that at `-O3`. – Peter Cordes May 16 '20 at 01:27
  • 1
    @old_timer: That point about writing functions that take args and return a value (not a compile time constant) is one I explained with some examples in [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116). `-O2 -fno-tree-vectorize` can be a good choice for a beginner. – Peter Cordes May 16 '20 at 01:41
  • @old_timer PeterCordes Thank you all for the invaluable comments! I'm learning riscv at school, so I should stick with it for now.. – Jay Lee May 16 '20 at 01:46
  • there are a couple of risc-v open cores that I have messed with online. Probably many others and expect more. Have some sifive boards and another one where they clearly purchased a risc-v core, ripped a cortex-m out and replaced it with the risc-v core. Risc-v is very MIPS inspired but it also has some other features implemented different from others. – old_timer May 16 '20 at 05:31

1 Answers1

1

I suspect you were trying to see the addition happen.

int main(void)
{
    int a = 1 + 2;
    return 0;
}

but with optimization say -O2, your dead code went away

00000000 <main>:
   0:   2000        movs    r0, #0
   2:   4770        bx  lr

The variable a is local, it never leaves the function it does not rely on anything outside of the function (globals, input variables, return values from called functions, etc). So it has no functional purpose it is dead code it doesn't do anything so an optimizer is free to remove it and did.

So I assume you went to use no or less optimization and then saw it was too verbose.

00000000 <main>:
   0:   cf 93           push    r28
   2:   df 93           push    r29
   4:   00 d0           rcall   .+0         ; 0x6 <main+0x6>
   6:   cd b7           in  r28, 0x3d   ; 61
   8:   de b7           in  r29, 0x3e   ; 62
   a:   83 e0           ldi r24, 0x03   ; 3
   c:   90 e0           ldi r25, 0x00   ; 0
   e:   9a 83           std Y+2, r25    ; 0x02
  10:   89 83           std Y+1, r24    ; 0x01
  12:   80 e0           ldi r24, 0x00   ; 0
  14:   90 e0           ldi r25, 0x00   ; 0
  16:   0f 90           pop r0
  18:   0f 90           pop r0
  1a:   df 91           pop r29
  1c:   cf 91           pop r28
  1e:   08 95           ret

If you want to see addition happen instead first off don't use main() it has baggage, and the baggage varies among toolchains. So try something else

unsigned int fun ( unsigned int a, unsigned int b )
{
    return(a+b);
}

now the addition relies on external items so the compiler cannot optimize any of this away.

00000000 <_fun>:
   0:   1d80 0002       mov 2(sp), r0
   4:   6d80 0004       add 4(sp), r0
   8:   0087            rts pc

If we want to figure out which one is a and which one is b then.

unsigned int fun ( unsigned int a, unsigned int b )
{
    return(a+(b<<1));
}

00000000 <_fun>:
   0:   1d80 0004       mov 4(sp), r0
   4:   0cc0            asl r0
   6:   6d80 0002       add 2(sp), r0
   a:   0087            rts pc

Want to see an immediate value

unsigned int fun ( unsigned int a )
{
    return(a+0x321);
}

00000000 <fun>:
   0:   8b 44 24 04             mov    eax,DWORD PTR [esp+0x4]
   4:   05 21 03 00 00          add    eax,0x321
   9:   c3                      ret 

you can figure out what the compilers return address convention is, etc.

But you will hit some limits trying to get the compiler to do things for you to learn asm likewise you can easily take the code generated by these compilations (using -save-temps or -S or disassemble and type it in (I prefer the latter)) but you can only get so far on your operating system in high level/C callable functions. Eventually you will want to do something bare-metal (on a simulator at first) to get maximum freedom and to try instructions you cant normally try or try them in a way that is hard or you don't quite understand yet how to use in the confines of an operating system in a function call. (please do not use inline assembly until down the road or never, use real assembly and ideally the assembler not the compiler to assemble it, down the road then try those things).


The one compiler was built for or defaults to using a stack frame so you need to tell the compiler to omit it. -fomit-frame-pointer. Note that one or both of these can be built to default not to have a frame pointer.

../gcc-$GCCVER/configure --target=$TARGET --prefix=$PREFIX --without-headers --with-newlib  --with-gnu-as --with-gnu-ld --enable-languages='c' --enable-frame-pointer=no

(Don't assume gcc nor clang/llvm have a "standard" build as they are both customizable and the binary you downloaded has someone's opinion of the standard build)

You are using main(), this has the return 0 or not thing and it can/will carry other baggage. Depends on the compiler and settings. Using something not main gives you the freedom to pick your inputs and outputs without it warning that you didn't conform to the short list of choices for main().

For gcc -O0 is ideally no optimization although sometimes you see some. -O3 is max give me all you got. -O2 is historically where folks live if for no other reason than "I did it because everyone else is doing it". -O1 is no mans land for gnu it has some items not in -O0 but not a lot of good ones in -O2, so depends heavily on your code as to whether or not you landed in one/some of the optimizations associated with -O1. These numbered optimization things if your compiler even has a -O option is just a pre-defined list 0 means this list 1 means that list and so on.

There is no reason to expect any two compilers or the same compiler with different options to produce the same code from the same sources. If two competing compilers were able to do that most if not all of the time something very fishy is going on...Likewise no reason to expect the list of optimizations each compiler supports, what each optimization does, etc, to match much less the -O1 list to match between them and so on.

There is no reason to assume that any two compilers or versions conform to the same calling convention for the same target, it is much more common now and further for the processor vendor to create a recommended calling convention and then the competing compilers to often conform to that because why not, everyone else is doing it, or even better, whew I don't have to figure one out myself, if this one fails I can blame them.

There are a lot of implementation defined areas in C in particular, less so in C++ but still...So your expectations of what come out and comparing compilers to each other may differ for this reason as well. Just because one compiler implements some code in some way doesn't mean that is how that language works sometimes it is how that compiler author(s) interpreted the language spec or had wiggle room.

Even with full optimizations enabled, everything that compiler has to offer there is no reason to assume that a compiler can outperform a human. Its an algorithm with limits programmed by a human, it cannot outperform us. With experience it is not hard to examine the output of a compiler for sometimes simple functions but often for larger functions and find missed optimizations, or other things that could have been done "better" for some opinion of "better". And sometimes you find the compiler just left something in that you think it should have removed, and sometimes you are right.

There is education as shown above in using a compiler to start to learn assembly language, and even with decades of experience and dabbling with dozens of assembly languages/instruction sets, if there is a debugged compiler available I will very often start with disassembling simple functions to start learning that new instruction set, then look those up then start to get a feel from what I find there for how to use it.

Very often starting with this one first:

unsigned int fun ( unsigned int a )
{
    return(a+5);
}

or

unsigned int fun ( unsigned int a, unsigned int b )
{
    return(a+b);
}

And going from there. Likewise when writing a disassembler or a simulator for fun to learn the instruction set I often rely on an existing assembler since it is often the documentation for a processor is lacking, the first assembler and compiler for that processor are very often done with direct access to the silicon folks and then those that follow can also use existing tools as well as documentation to figure things out.

So you are on a good path to start learning assembly language I have strong opinions on which ones to or not to start with to improve the experience and chances of success, but I have been in too many battles on Stack Overflow this week, I'll let that go. You can see that I chose an array of instruction sets in this answer. And even if you don't know them you can probably figure out what the code is doing. "standard" installs of llvm provide the ability to output assembly language for several instruction sets from the same source code. The gnu approach is you pick the target (family) when you compile the toolchain and that compiled toolchain is limited to that target/family but you can easily install several gnu toolchains on your computer at the same time be they variations on defaults/settings for the same target or different targets. A number of these are apt gettable without having to learn to build the tools, arm, avr, msp430, x86 and perhaps some others.

I cannot speak to the why does it not return zero from main when you didn't actually have any return code. See comments by others and read up on the specs for that language. (or ask that as a separate question, or see if it was already answered).

Now you said Apple clang not sure what that reference was to I know that Apple has put a lot of work into llvm in general. Or maybe you are on a mac or in an Apple supplied/suggested development environment, but check Wikipedia and others, clang had a lot of corporate help not just Apple, so not sure what the reference was there. If you are on an Apple computer then the apt gettable isn't going to make sense, but there are still lots of pre-built gnu (and llvm) based toolchains you can download and install rather than attempt to build the toolchain from sources (which isn't difficult BTW).

halfer
  • 19,824
  • 17
  • 99
  • 186
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    Thank you for this comprehensive answer! I have a small question though, is the first ASM snippet in ARM architecture? – Jay Lee May 16 '20 at 03:24
  • Ok, I just reached "You can see that I chose an array of instruction sets in this answer.", and now I get that you deliberately chose different asm's! :) – Jay Lee May 16 '20 at 03:48
  • 1
    @JayLee: yes, `bx lr` is uniquely ARM. The next block, showing the same code with optimization disabled, is actually also compiled for a different architecture with no mention of which ISA is which, not exactly the clearest way to illustrate what changes when you change compiler options for the same source. That might be 16-bit MSP430 for the blocks with instructions like `incd r4`. It uses destination-last asm syntax, as we can see from `add 4(sp), r0` which is obviously adding a stack arg to the return-value register. – Peter Cordes May 16 '20 at 04:19
  • I changed the second one, it is not arm that is true but this one is actually readable the other wasnt. The msp430 has more registers and doesnt need to use the stack, msp430 would be add r14,r15 ; ret .after the last edit: thumb, avr, pdp11, pdp11, i386. yes like the middle two I should have probably done the first two the same...all gcc, had I had more to say I would have used more. – old_timer May 16 '20 at 05:24