2

I happened run into a strange problem, see the code below:

#include <stdio.h>

int main() {
    unsigned char a = 10;
    unsigned int b = 10;

    long long x = -a;
    long long y = -b;

    printf("x = %lld, y = %lld\n", x, y);
    return 0;
}

Output:

x = -10, y = 4294967286

As you can see, when I assign -b to y (long long), it gives the wrong result, but the value of x is as expected.

After reading the asm code, I found that when assign negtive unsigned char to long long, the compiler generates a cdqe instruction(see line +25) to extend the sign to rax, while it doesn't do the same thing when unsigned int to long long.

I know the reason we got the value 4294967286 is that the high 32 bit of rax are all zero, rax = 0x00000000fffffff6.

So my question is why the compiler missing the cdqe instruction in the unsigned int case?

Dump of assembler code for function main:
   0x000000000040052d <+0>:     push   rbp
   0x000000000040052e <+1>:     mov    rbp,rsp
   0x0000000000400531 <+4>:     sub    rsp,0x20
=> 0x0000000000400535 <+8>:     mov    BYTE PTR [rbp-0x1],0xa
   0x0000000000400539 <+12>:    mov    DWORD PTR [rbp-0x8],0xa
   0x0000000000400540 <+19>:    movzx  eax,BYTE PTR [rbp-0x1]
   0x0000000000400544 <+23>:    neg    eax
   0x0000000000400546 <+25>:    cdqe   
   0x0000000000400548 <+27>:    mov    QWORD PTR [rbp-0x10],rax
   0x000000000040054c <+31>:    mov    eax,DWORD PTR [rbp-0x8]
   0x000000000040054f <+34>:    neg    eax
   0x0000000000400551 <+36>:    mov    eax,eax
   0x0000000000400553 <+38>:    mov    QWORD PTR [rbp-0x18],rax
   0x0000000000400557 <+42>:    mov    rdx,QWORD PTR [rbp-0x18]
   0x000000000040055b <+46>:    mov    rax,QWORD PTR [rbp-0x10]
   0x000000000040055f <+50>:    mov    rsi,rax
   0x0000000000400562 <+53>:    mov    edi,0x400610
   0x0000000000400567 <+58>:    mov    eax,0x0
   0x000000000040056c <+63>:    call   0x400410 <printf@plt>
   0x0000000000400571 <+68>:    mov    eax,0x0
   0x0000000000400576 <+73>:    leave  
   0x0000000000400577 <+74>:    ret    

Env:

OS: CentOS Linux release 7.8.2003 (Core), Linux 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 GNU/Linux
Gcc: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
leen
  • 35
  • 3
  • 2
    4294967286 is -10 as unsigned int. It's doing what you told it to do. – stark Jun 29 '21 at 10:22
  • Is it because it yields the result as twos complement ? – Sorenp Jun 29 '21 at 10:26
  • [Implicit type promotion rules](https://stackoverflow.com/questions/46073295/implicit-type-promotion-rules) – Lundin Jun 29 '21 at 10:33
  • @Lundin Wow ... That's a pretty thorough answer. :-) – Ted Lyngmo Jun 29 '21 at 10:37
  • 4
    @Sorenp unsigned arithmetic conversions don't depend on 2's complement. The standard says `Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.` This is the mathematical value. So you get the mathematical result -10 + UINT_MAX + 1 regardless of signedness format used on the system. – Lundin Jun 29 '21 at 10:38
  • @Lundin Thank you! And you have dropped some nice links today ! Good for the evening read :) – Sorenp Jun 29 '21 at 10:40
  • `a` goes to `int` then negated as `int` (becomes negative as int) then goes from `int` (signed) to `long long`, preserving the sign; whereas `b` is `unsigned int` (so cannot be converted to `int`), then is negated as `unsigned int` then converted from `unsigned int` to `long long`, which is a zero extending conversion. – Erik Eidt Jun 29 '21 at 10:42
  • @Lundin thanks for the nice link, it brings me more info about type promotion. – leen Jun 30 '21 at 01:38

1 Answers1

5

This has to do with implicit integer conversions.

  • The unsigned char a: Is first converted to an int and then negated. Result, -10.

  • The unsigned int b: Is not converted, so -b negates the unsigned int and you'll get UINT_MAX - 10 + 1.

If you cast to the target type first, -(long long)b, you'll get -10 there too.

Further reading: Implicit conversions

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • Thanks for the brief explaination. I seems I misunderstood the type type promotion before. I thought all the right-hand operands would be converted to the type of right-head operand, and then perform arithmetic and assignment. – leen Jun 30 '21 at 01:33
  • @leen You're welcome! The implicit conversion is a bit messy imo – Ted Lyngmo Jun 30 '21 at 06:34