1

Given a simple function

int add(int a, int b) {
    return a + b;
}

Compile it with clang -O3 -c -o test.o test.c. The compiler version is

Apple clang version 11.0.3 (clang-1103.0.32.62)
Target: x86_64-apple-darwin19.5.0

The disassembly of the object file shows

test.o: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:

0000000000000000 _add:
       0: 55                            pushq   %rbp
       1: 48 89 e5                      movq    %rsp, %rbp
       4: 8d 04 37                      leal    (%rdi,%rsi), %eax
       7: 5d                            popq    %rbp
       8: c3                            retq

Obviously the pushq, movq and popq instructions do nothing than wasting CPU time.

Compiling the same piece of code on Linux with clang version 7.0.1-8 (tags/RELEASE_701/final) Target: x86_64-pc-linux-gnu yields the truly optimized instructions below.

test.o:     file format elf64-x86-64
Disassembly of section .text:

0000000000000000 <add>:
   0:   8d 04 37                lea    (%rdi,%rsi,1),%eax
   3:   c3                      retq

Is there anything wrong with Apple Clang?

A related question is here: Apple clang -O1 not optimizing enough? But the answer there does not address my question.

Zhiyao
  • 4,152
  • 2
  • 12
  • 21
  • 5
    It's setting up a stack frame. This is apparently the default on OSX, probably because it makes debugging easier. See https://books.google.com/books?id=xYI7DwAAQBAJ&lpg=PT366&ots=ZGq1nfQsSm&dq=clang%20osx%20stack%20frame&pg=PT366#v=onepage&q=clang%20osx%20stack%20frame&f=false. Try `-fomit-frame-pointer` to make it go away. – Nate Eldredge Jun 08 '20 at 13:39

0 Answers0