I happened run into a strange problem, see the code below:
#include <stdio.h>
int main() {
unsigned char a = 10;
unsigned int b = 10;
long long x = -a;
long long y = -b;
printf("x = %lld, y = %lld\n", x, y);
return 0;
}
Output:
x = -10, y = 4294967286
As you can see, when I assign -b
to y
(long long), it gives the wrong result, but the value of x is as expected.
After reading the asm code, I found that when assign negtive unsigned char to long long, the compiler generates a cdqe
instruction(see line +25) to extend the sign to rax, while it doesn't do the same thing when unsigned int to long long.
I know the reason we got the value 4294967286
is that the high 32 bit of rax are all zero, rax = 0x00000000fffffff6
.
So my question is why the compiler missing the cdqe
instruction in the unsigned int case?
Dump of assembler code for function main:
0x000000000040052d <+0>: push rbp
0x000000000040052e <+1>: mov rbp,rsp
0x0000000000400531 <+4>: sub rsp,0x20
=> 0x0000000000400535 <+8>: mov BYTE PTR [rbp-0x1],0xa
0x0000000000400539 <+12>: mov DWORD PTR [rbp-0x8],0xa
0x0000000000400540 <+19>: movzx eax,BYTE PTR [rbp-0x1]
0x0000000000400544 <+23>: neg eax
0x0000000000400546 <+25>: cdqe
0x0000000000400548 <+27>: mov QWORD PTR [rbp-0x10],rax
0x000000000040054c <+31>: mov eax,DWORD PTR [rbp-0x8]
0x000000000040054f <+34>: neg eax
0x0000000000400551 <+36>: mov eax,eax
0x0000000000400553 <+38>: mov QWORD PTR [rbp-0x18],rax
0x0000000000400557 <+42>: mov rdx,QWORD PTR [rbp-0x18]
0x000000000040055b <+46>: mov rax,QWORD PTR [rbp-0x10]
0x000000000040055f <+50>: mov rsi,rax
0x0000000000400562 <+53>: mov edi,0x400610
0x0000000000400567 <+58>: mov eax,0x0
0x000000000040056c <+63>: call 0x400410 <printf@plt>
0x0000000000400571 <+68>: mov eax,0x0
0x0000000000400576 <+73>: leave
0x0000000000400577 <+74>: ret
Env:
OS: CentOS Linux release 7.8.2003 (Core), Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 GNU/Linux
Gcc: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)