4

I bumped into this while writing a program trying to print the constituent byte values of UTF-8 characters.

This is the program that I wrote to test the various ~0 operations:

#include <stdio.h>

int main()
{
    printf("%x\n", (char)~0); // ffffffff
    printf("%x\n", (unsigned char)~0); // ff
    printf("%d\n", sizeof(char) == sizeof(unsigned char)); // 1
    printf("%d\n", sizeof(char) == sizeof(unsigned int)); // 0
    printf("%d\n", (char)~0 == (unsigned int)~0); // 1
}

I'm struggling to understand why char would produce an int-sized value, when unsigned char produces a char-sized value.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
Marcus Harrison
  • 819
  • 6
  • 19
  • 1
    `%x` expects an `unsigned int`. So when you pass `-1`, it gets converted to the largest `unsigned int` (on a 2's comp machine). I don't know if that's standard, or just what happens here. Using `%hhx` would do the right thing. But using an unsigned type would make more sense. – ikegami Feb 25 '22 at 18:12
  • 1
    If `char` is signed, `(char)~0` is probably converted to `(char)-1`. By the *default argument promotions*, that `(char)-1` is converted to `(int)-1`. – Ian Abbott Feb 25 '22 at 18:13
  • You cannot send a `char` through to `printf()`. It is automagically converted to `int` in the process of calling the function. When `char` is signed (such as in your implementation), `(char)~0` is a negative value. When a negative value is re-interpreted as `unsigned int` (when `printf()` processes the `"%x"`) it has a bunch of binary `1`s at the most significant bits. – pmg Feb 25 '22 at 18:13
  • 1
    More accurate version of my earlier comment: `%x` expects an `unsigned int`. So the `-1` you pass (as an `int` thanks to integer promotion) gets interpreted as an `unsigned int`, giving the largest `unsigned int` on a 2's comp machine. Using `%hhx` would do the right thing. But using an unsigned type (e.g. `unsigned char`) would make more sense. – ikegami Feb 25 '22 at 18:19
  • @EricPostpischil `~0` would produce `(int)-1` (assuming 2's complement) so would be within the range of a signed `char`. – Ian Abbott Feb 25 '22 at 18:30
  • @IanAbbott: Ah, right. – Eric Postpischil Feb 25 '22 at 18:33

3 Answers3

8

When passing a type smaller than int to a variadic function like printf, it get promoted to type int.

In the first case, you're passing char with value -1 whose representation (assuming 2's complement) is 0xff. This is promoted to an int with value -1 and representation 0xffffffff, so this is what is printed.

In the second case, you're passing an unsigned char with value 255 whose representation is 0xff. This is promoted to an int with value 255 and representation 0x000000ff, so this is what is printed (without the leading zeros).

dbush
  • 205,898
  • 23
  • 218
  • 273
  • When explained like this it makes total sense, it's an arithmetic promotion, not bitwise. I hadn't considered that at all. The signed char -1 is converted to signed int -1 and treated as an unsigned int for printing. – Marcus Harrison Feb 25 '22 at 18:18
2

They do not produce values of different widths. They produce values with different numbers of set bits in them.

In your C implementation, it appears int is 32 bits and char is signed. I will use these in this answer, but readers should note the C standard allows other choices.

I will use hexadecimal to denote the bits that represent values.

In (char)~0, 0 is an int. ~0 then has bits FFFFFFFF. In a 32-bit two’s complement int, this represents −1. (char) converts this to a char.

At this point, we have a char with value −1, represented with bits FF. When that is passed as an argument to printf, it is automatically converted to an int. Since its value is −1, it is converted to an int with value −1. The bits representing that int are FFFFFFFF. You ask printf to format this with %x. Technically, that is a mistake; %x is for unsigned int, but your printf implementation formats the bits FFFFFFFF as if they were an unsigned int, producing output of “ffffffff”.

In (unsigned char)~0), ~0 again has value −1 represented with bits FFFFFFFF, but now the cast is to unsigned char. Conversion to an unsigned integer type wraps modulo M, where M is one more than the maximum value of the type, so 256 for an eight-bit unsigned char. Mathematically, the conversion is −1 + 1•256 = 255, which is the starting value plus the multiple of 256 needed to bring the value into the range of unsigned char. The result is 255. Practically, it is implemented by taking the low eight bits, so FFFFFFFF becomes FF. However, in unsigned char, the bits FF represent 255 instead of −1.

Now we have an unsigned char with value 255, represented with bits FF. Passing that to printf results in automatic conversion to an int. Since its unsigned char value is 255, the result of conversion to int is 255. When you ask printf to format this with %x (which is a mistake as above), printf formats it as if the bits were an unsigned int, producing output of “ff”.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

In these both calls

printf("%x\n", (char)~0); // ffffffff
printf("%x\n", (unsigned char)~0); // ff

the expressions (char)~0) and (unsigned char)~0) are converted to the type int due to the integer promotions.

In the used system the type char behaves as the type signed char. So the sign bit in this expression is propagated when the expression is promoted to the type int.

On the other hand, before the integer promotions this expression (unsigned char)~0 has the type unsigned char due to the casting to the unsigned type. So neither sign bit is propagated when the expression is promoted to the type int.

Pay attention to that the conversion specifier x is applied to objects of the type unsigned int. So the first call of printf should be written like

printf("%x\n", ( unsigned int )(char)~0);
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335