I wrote this swap function in Linux x86-64 assembly.
swap: ; written by me
mov al, byte [rdi]
xchg byte [rsi], al
mov byte [rdi], al
ret
Out of curiosity, I also compiled the following C code with -O3
void swap(char *a, char *b) {
char temp = *a;
*a = *b;
*b = temp;
}
used objconv on it to get the following assembly.
swap: ; written by GCC -O3
endbr64 ; 0000 _ F3: 0F 1E. FA
movzx eax, byte [rdi] ; 0004 _ 0F B6. 07
movzx edx, byte [rsi] ; 0007 _ 0F B6. 16
mov byte [rdi], dl ; 000A _ 88. 17
mov byte [rsi], al ; 000C _ 88. 06
ret ; 000E _ C3
What it has done also makes sense, and does the job, but my code is shorter. Is this a missed optimization? If so, how's this missed by GCC?