Given a simple function
int add(int a, int b) {
return a + b;
}
Compile it with clang -O3 -c -o test.o test.c
. The compiler version is
Apple clang version 11.0.3 (clang-1103.0.32.62)
Target: x86_64-apple-darwin19.5.0
The disassembly of the object file shows
test.o: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
0000000000000000 _add:
0: 55 pushq %rbp
1: 48 89 e5 movq %rsp, %rbp
4: 8d 04 37 leal (%rdi,%rsi), %eax
7: 5d popq %rbp
8: c3 retq
Obviously the pushq
, movq
and popq
instructions do nothing than wasting CPU time.
Compiling the same piece of code on Linux with clang version 7.0.1-8 (tags/RELEASE_701/final)
Target: x86_64-pc-linux-gnu
yields the truly optimized instructions below.
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <add>:
0: 8d 04 37 lea (%rdi,%rsi,1),%eax
3: c3 retq
Is there anything wrong with Apple Clang?
A related question is here: Apple clang -O1 not optimizing enough? But the answer there does not address my question.