3

Having following simple C++ code:

#include <stdio.h>

int main() {
    char c1 = 130;
    unsigned char c2 = 130;

    printf("1: %+u\n", c1);
    printf("2: %+u\n", c2);
    printf("3: %+d\n", c1);
    printf("4: %+d\n", c2);
    ...
    return 0;
}

the output is like that:

1: 4294967170
2: 130
3: -126
4: +130

Can someone please explain me the line 1 and 3 results?

I'm using Linux gcc compiler with all default settings.

learnvst
  • 15,455
  • 16
  • 74
  • 121
Daros
  • 59
  • 5
  • 5
    This is called overflow. – L. F. Jun 03 '19 at 09:36
  • Can you provide a [mre]? (i.e., adding the `#include`s and `main`?) – L. F. Jun 03 '19 at 09:41
  • 1
    @L.F. Not by the standard ... in which "overflow" refers to an arithmetic operation producing a result out of bounds for its type – M.M Jun 03 '19 at 09:52
  • The concept of overflow is explained here; [C++ integer overflow](https://stackoverflow.com/questions/29235436/c-integer-overflow). The `char` is represented on 8 bits – Butiri Dan Jun 03 '19 at 09:52
  • 1
    Looks more like C code to me. – n. m. could be an AI Jun 03 '19 at 10:16
  • [printf](https://en.cppreference.com/w/cpp/io/c/fprintf) function reads arguments by [va_arg](https://en.cppreference.com/w/cpp/utility/variadic/va_arg) from stack according to the format specified in the first argument string. I.e. if you have %u - it means read 4 bytes (unsigned int) from the top of the stack. You've put only one byte into stack - as well as result is undefined. Try to `printf("1: %+u\n", (unsigned int)c1);` and `printf("3: %+d\n", (int)c1);` 2. Range for `signed char type` is -127...+ 128 – Victor Gubin Jun 04 '19 at 10:45
  • It's C++ as long as it's compiled as C++. – Lightness Races in Orbit Jun 05 '19 at 11:45

3 Answers3

1

A char is 8 bits. This means it can represent 2^8=256 unique values. A uchar represents 0 to 255, and a signed char represents -128 to 127 (could represent absolutely anything, but this is the typical platform implementation). Thus, assigning 130 to a char is out of range by 2, and the value overflows and wraps the value to -126 when it is interpreted as a signed char. The compiler sees 130 as an integer and makes an implicit conversion from int to char. On most platforms an int is 32-bit and the sign bit is the MSB, the value 130 easily fits into the first 8-bits, but then the compiler wants to chop of 24 bits to squeeze it into a char. When this happens, and you've told the compiler you want a signed char, the MSB of the first 8 bits actually represents -128. Uh oh! You have this in memory now 1000 0010, which when interpreted as a signed char is -128+2. My linter on my platform screams about this . .

angry linter

I make that important point about interpretation because in memory, both values are identical. You can confirm this by casting the value in the printf statements, i.e., printf("3: %+d\n", (unsigned char)c1);, and you'll see 130 again.

The reason you see the large value in your first printf statement is that you are casting a signed char to an unsigned int, where the char has already overflowed. The machine interprets the char as -126 first, and then casts to unsigned int, which cannot represent that negative value, so you get the max value of the signed int and subtract 126.

2^32-126 = 4294967170 . . bingo

In printf statement 2, all the machine has to do is add 24 zeros to reach 32-bit, and then interpret the value as int. In statement one, you've told it that you have a signed value, so it first turns that to a 32-bit -126 value, and then interprets that -ve integer as an unsigned integer. Again, it flips how it interprets the most significant bit. There are 2 steps:

  1. Signed char is promoted to signed int, because you want to work with ints. The char (is probably copied and) has 24 bits added. Because we're looking at a signed value, some machine instruction will happen to perform twos complement, so the memory here looks quite different.
  2. The new signed int memory is interpreted as unsigned, so the machine looks at the MSB and interprets it as 2^32 instead of -2^31 as happened in the promotion.

An interesting bit of trivia, is you can suppress the clang-tidy linter warning if you do char c1 = 130u;, but you still get the same garbage based on the above logic (i.e. the implicit conversion throws away the first 24-bits, and the sign-bit was zero anyhow). I'm have submitted an LLVM clang-tidy missing functionality report based on exploring this question (issue 42137 if you really wanna follow it) .

learnvst
  • 15,455
  • 16
  • 74
  • 121
  • It would be nice if you mention that this is *undefined behavior*. – L. F. Jun 03 '19 at 09:49
  • @L.F. it isn't UB (unless you mean incorrect format specifiers) – M.M Jun 03 '19 at 09:52
  • @learnvst Ok, can You explain how does the wrap is made? And what about the line #1: 4294967170 value? – Daros Jun 03 '19 at 09:55
  • @Daros - sorry, cleaned that answer up a little further – learnvst Jun 03 '19 at 10:11
  • 2
    @M.M After some research, I found out that `char(130)` is implementation-defined behavior instead of UB. Take my comment to mean the incorrect format specifiers ;-) – L. F. Jun 03 '19 at 10:26
  • @learnvst So in case #3 (-126) the value -126 is produced by applying 2-complements on 130? – Daros Jun 03 '19 at 15:28
  • @Daros. It is not applying any form of algorithm. You are just telling the machine to interpret those 8 bits in memory as a signed value, therefore, it takes the most significant bit, and interprets that as -128 and adds 2, rather than interpreting is as 128 and adding 2 – learnvst Jun 04 '19 at 12:15
  • @Daros - I also appended that to the answer itself – learnvst Jun 04 '19 at 12:19
  • @learnvst In Your's answer (first sentence in #3 paragraph) "...you are casting a signed char to a signed int..." shouldn't rather be "casting a signed chat to an UNSIGNED int" as "u" specifier tells? – Daros Jun 05 '19 at 08:40
1

(This answer assumes that, on your machine, char ranges from -128 to 127, that unsigned char ranges from 0 to 255, and that unsigned int ranges from 0 to 4294967295, which happens to be the case.)

char c1 = 130;

Here, 130 is outside the range of numbers representable by char. The value of c1 is implementation-defined. In your case, the number happens to "wrap around," initializing c1 to static_cast<char>(-126).

In

printf("1: %+u\n", c1);

c1 is promoted to int, resulting in -126. Then, it is interpreted by the %u specifier as unsigned int. This is undefined behavior. This time the resulting number happens to be the unique number representable by unsigned int that is congruent to -126 modulo 4294967296, which is 4294967170.

In

printf("3: %+d\n", c1);

The int value -126 is interpreted by the %d specifier as int directly, and outputs -126 as expected (?).

L. F.
  • 19,445
  • 8
  • 48
  • 82
1

In cases 1, 2 the format specifier doesn't match the type of the argument, so the behaviour of the program is undefined (on most systems). On most systems char and unsigned char are smaller than int, so they promote to int when passed as variadic arguments. int doesn't match the format specifier %u which requires unsigned int.

On exotic systems (which your target is not) where unsigned char is as large as int, it will be promoted to unsigned int instead, in which case 4 would have UB since it requires an int.


Explanation for 3 depends a lot on implementation specified details. The result depends on whether char is signed or not, and it depends on the representable range.

If 130 was a representable value of char, such as when it is an unsigned type, then 130 would be the correct output. That appears to not be the case, so we can assume that char is a signed type on the target system.

Initialising a signed integer with an unrepresentable value (such as char with 130 in this case) results in an implementation defined value.

On systems with 2's complement representation for signed numbers - which is ubiquitous representation these days - the implementation defined value is typically the representable value that is congruent with the unrepresentable value modulo the number of representable values. -126 is congruent with 130 modulo 256 and is a representable value of char.

eerorika
  • 232,697
  • 12
  • 197
  • 326