7

I am surprised by C++'s behavior when applying bit-wise not to an unsigned char.

Take the binary value 01010101b, which is 0x55, or 85. Applying bit-wise not on an eight bit representation should yield 10101010b, which is 0xAA, or 170.

However, I cannot reproduce the above in C++. The following simple assertion fails.

assert(static_cast<unsigned char>(0xAAu) == ~static_cast<unsigned char>(0x55u));

I printed the values of 0x55, 0xAA, and ~0x55 (as uchar) with the following code. And it reveals that the bit-wise not does not do what I expect it to do.

std::cout << "--> 0x55: " << 0x55u << ", 0xAA: " << 0xAAu << ", ~0x55: "
     << static_cast<unsigned>(~static_cast<unsigned char>(0x55u)) << std::endl;

--> 0x55: 85, 0xAA: 170, ~0x55: 4294967210

The number that is printed for ~0x55 is equal to 11111111111111111111111110101010b, which is the 32-bit bit-wise not of 0x55. So, the ~ operator is operating on 32-bit integers even if I explicitly cast the input to an unsigned char. Why is that?

I applied another test to see what type the ~ operator returns. And it turns out to be int on an unsigned char input:

template <class T>
struct Print;

// inside main()    
Print<decltype(~static_cast<unsigned char>(0x55))> dummy;

Yields the following compiler error, which indicates, that the result is of type int.

error: implicit instantiation of undefined template 'Print<int>'
    Print<decltype(~static_cast<unsigned char>(0x55u))> dummy;

What am I doing wrong? Or, how do I get C++ to produce 0xAA from ~0x55?

Full code is here

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Lemming
  • 4,085
  • 3
  • 23
  • 36

3 Answers3

9

Integral promotions are performed on the operand of ~ we can see this by going to the draft C++ standard section 5.3.1 Unary operators which says (emphasis mine):

The operand of ˜ shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand [...]

and the integral promotions are covered in section 4.5 Integral promotions and say:

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type;

For completeness, to see that unsigned char rank is less than the rank of int we can go to section 4.13 Integer conversion rank which says:

The rank of a signed integer type shall be greater than the rank of any signed integer type with a smaller size.

and:

The rank of char shall equal the rank of signed char and unsigned char.

One solution would be to assign the result to an unsigned char which, this is safe since you don't have to worry about signed integer overflow.

As Ben Voigt points out it would compliant to have a system where sizeof (int) == 1 and CHAR_BIT >= 32. In which case the rank of unsigned char woudl not be less than int and therefore the promotion would be to unsigned int. We do not know of any systems that this actually occurs on.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • +1: Though it is likely obvious, it is worth mentioning that `unsigned char` is indeed (a) lower conversion rank than `int`, and (b) full representable as `int` (all `unsigned char` values can be represented as `int`). Thus the promotion is sound. – WhozCraig Sep 29 '14 at 13:53
  • I looked further into it. The same thing is happening for the other binary operations `&`, `|`, etc. So, it seems that the moral of the story is that one should treat values within expressions at least as integers. – Lemming Sep 29 '14 at 14:40
  • @Lemming in general C and C++ works with int and larger sized data types for math and bitwise operations see [Why must a short be converted to an int before arithmetic operations in C and C++?](http://stackoverflow.com/q/24371868/1708801) – Shafik Yaghmour Sep 29 '14 at 14:54
  • Thanks! IIRC the exact size of `int` is not strictly defined. Which means that even the assertion `~0x55555555u==0xAAAAAAAAu` is actually ill-defined, correct? Because `int` could be 64-bit long, in which case there would be a bunch of `F`s before the `A`s. – Lemming Sep 29 '14 at 15:36
  • @Lemming correct, we only know the [minimum ranges](http://stackoverflow.com/q/589575/1708801). – Shafik Yaghmour Sep 29 '14 at 17:26
  • There's just one flaw in this answer -- it claims that `unsigned char` rank is less than `int` which is usually but not necessarily true. It's perfectly legal to use C++ on a system where `sizeof (int) == 1` and `CHAR_BIT >= 32` in which case `unsigned char` promotes to `unsigned int`, not `int`. I must say that I have never seen such a system. – Ben Voigt Apr 10 '15 at 14:22
  • @BenVoigt ok, let me adjust that – Shafik Yaghmour Apr 10 '15 at 14:31
3

The answer about integral promotion is correct.

You can get the desired results by casting and NOTting in the right order:

assert(static_cast<unsigned char>(0xAAu) == static_cast<unsigned char>(~0x55u));
StilesCrisis
  • 15,972
  • 4
  • 39
  • 62
1

You can kind of "truncate" the leading 1's by assigning the result of ~0x55 to an unsigned char:

#include <iostream>

int main()
{
    unsigned char input = 0x55;
    unsigned char output = ~input;

    std::cout << "input: " << (int)input << " output: " << (int)output << std::endl;

    return 0;
}

Result:

input: 85 output: 170
TobiMcNamobi
  • 4,687
  • 3
  • 33
  • 52
  • 2
    It's not necessary to do the `|` operation, simply assigning to a `unsigned char` variable should do it: `output = ~input`. – Mark Ransom Sep 29 '14 at 13:58