Does gcc optimize code such as:
if(cond)
func1()
func2()
if(cond)
func3()
If cond is constant? Also, what are good resources for learning more about C optimizations? (So I don't have to ask more stupid questions like this)
Does gcc optimize code such as:
if(cond)
func1()
func2()
if(cond)
func3()
If cond is constant? Also, what are good resources for learning more about C optimizations? (So I don't have to ask more stupid questions like this)
Yes compilers will certainly optimize this if they prove that it is allowed and if they find it worthwhile.
I tried on recent version of gcc
, icc
and clang
and they all optimized the above code to, effectively:
if (cond) {
func1();
func2();
func3();
} else {
func2();
}
That is, they duplicated the call to func2()
to enable this optimization. Here's a typical example of the compiled code, from gcc:
cmp edi, 33
je .L7
xor eax, eax
jmp func2
.L7:
sub rsp, 8
xor eax, eax
call func1
xor eax, eax
call func2
xor eax, eax
add rsp, 8
jmp func3
Of course, such an optimization isn't guaranteed - a compiler may decide it isn't worth it, and it may depend on compiler-specific heuristics and compile options. For example, when using -Os
(as opposed to -O2
which I used above), gcc
no longer re-arranges it and instead compiles it more or less as you've written it, with two cmp
instructions:
cmp edi, 33
jne .L5
xor eax, eax
call func1
.L5:
xor eax, eax
call func2
cmp edi, 33
jne .L4
xor eax, eax
jmp func3
.L4:
ret
On the other hand, both clang
and icc
continue to compile it with a single comparison while duplicating the call
.
You can play with all of this on godbolt.
Generally yes. Common sub expressions are calculated at function scope, so if we have if( x * x / y > z + z) appearing twice, x, y and z unchanging, the expression will only be calculated once. However the compiler needs to know, for example that the address of the variables has not been taken and passed to a subroutine somewhere.
You can check what a compiler is doing by looking at its assembly code. I've created a little C program to work with, making sure it has some variability so the optimizer doesn't just constant fold everything.
#include <stdio.h>
void func1(const char input) {
printf("func1: %c\n", input);
}
void func2(const char input) {
printf("func2: %c\n", input);
}
void func3(const char input) {
printf("func3: %c\n", input);
}
int main() {
char input = (char)getchar();
if(input == 'Y') {
func1(input);
}
func2(input);
if(input == 'Y') {
func3(input);
}
}
You can generate assembly with -S
.
gcc -S test.c -o test.asm
Note that this is without optimizations. You can find the comparison without having to read much assembly, look for 89 which is the ASCII decimal representation of Y
.
LCFI10:
subq $16, %rsp
call _getchar
movb %al, -1(%rbp)
cmpb $89, -1(%rbp)
jne L5
movsbl -1(%rbp), %eax
movl %eax, %edi
call _func1
L5:
movsbl -1(%rbp), %eax
movl %eax, %edi
call _func2
cmpb $89, -1(%rbp)
jne L6
movsbl -1(%rbp), %eax
movl %eax, %edi
call _func3
L6:
movl $0, %eax
leave
That's a fairly rote translation of the two if blocks. You can see two cmpb $89, -1(%rbp)
calls indicating the comparison is being done twice.
Now with optimizations: gcc -S -O3 test.c -o test.asm
LCFI0:
call _getchar
cmpb $89, %al
je L10
leaq lC1(%rip), %rdi
movsbl %al, %esi
xorl %eax, %eax
call _printf
L7:
xorl %eax, %eax
addq $8, %rsp
LCFI1:
ret
L10:
LCFI2:
leaq lC0(%rip), %rdi
movl $89, %esi
xorl %eax, %eax
call _printf
movl $89, %esi
xorl %eax, %eax
leaq lC1(%rip), %rdi
call _printf
movl $89, %esi
xorl %eax, %eax
leaq lC2(%rip), %rdi
call _printf
jmp L7
Now there's only one cmpb $89, %al
, but there's also multiple movl $89, %esi
. lC0, lC1, and lC2 are the strings "func1: %c\n"
, "func2: %c\n"
and "func3: %c\n"
cooresponding to the three functions.
For comparison, here's what clang does.
Ltmp12:
.cfi_offset %rbx, -24
callq _getchar
movsbl %al, %ebx
movzbl %bl, %eax
cmpl $89, %eax
jne LBB3_2
## BB#1:
leaq L_.str(%rip), %rdi
xorl %eax, %eax
movl %ebx, %esi
callq _printf
leaq L_.str.1(%rip), %rdi
xorl %eax, %eax
movl %ebx, %esi
callq _printf
leaq L_.str.2(%rip), %rdi
jmp LBB3_3
LBB3_2:
leaq L_.str.1(%rip), %rdi
As with gcc there's now only one comparison. And, similarly, L_.str, L_.str.1, and L.str.2 are the strings in func1, func2, and func3 respectively.
Both have, essentially, changed the code to be this.
if( input == 'Y' ) {
printf("func1: %c\n", input);
printf("func2: %c\n", input);
printf("func3: %c\n", input);
}
else {
printf("func2: %c\n", input);
}