I am trying to learn more about assembly and which optimizations compilers can and cannot do.
I have a test piece of code for which I have some questions.
See it in action here: https://godbolt.org/z/pRztTT, or check the code and assembly below.
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
for (int j = 0; j < 100; j++) {
if (argc == 2 && argv[1][0] == '5') {
printf("yes\n");
}
else {
printf("no\n");
}
}
return 0;
}
The assembly produced by GCC 10.1 with -O3:
.LC0:
.string "no"
.LC1:
.string "yes"
main:
push rbp
mov rbp, rsi
push rbx
mov ebx, 100
sub rsp, 8
cmp edi, 2
je .L2
jmp .L3
.L5:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
je .L4
.L2:
mov rax, QWORD PTR [rbp+8]
cmp BYTE PTR [rax], 53
jne .L5
mov edi, OFFSET FLAT:.LC1
call puts
sub ebx, 1
jne .L2
.L4:
add rsp, 8
xor eax, eax
pop rbx
pop rbp
ret
.L3:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
je .L4
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
jne .L3
jmp .L4
It seems like GCC produces two versions of the loop: one with the argv[1][0] == '5'
condition but without the argc == 2
condition, and one without any condition.
My questions:
- What is preventing GCC from splitting away the full condition? It is similar to this question, but there is no chance for the code to get a pointer into argv here.
- In the loop without any condition (L3 in assembly), why is the loop body duplicated? Is it to reduce the number of jumps while still fitting in some sort of cache?