2

How can an unsigned char, for example, take values from -128 to +127? From what I understand the most significant bit is used to represent the sign of the number, and the remaining bits of a char are used to represent the magnitude of the number. Now, the largest possible magnitude for 7 bits is 127, so shouldn't the range be from -127 to +127? How can -128 be a result?

Secondly, what is the bit-level logic behind the following behavior

#include <stdio.h>

int main()
{
    signed char x = 127;
    x += 1;
    printf("%i", x);
}

Output:

-128

As one can see, x becomes -128, but why? What is the arithmetic behind this behavior?

alk
  • 69,737
  • 10
  • 105
  • 255
Abhishek Sagar
  • 1,189
  • 4
  • 20
  • 44
  • 4
    Look up One's Complement and Two's Complement for an explanation as how integer types are represented in memory. – DeiDei Dec 17 '16 at 21:07
  • 5
    you need to read about two's complement on binary numbers. No need to ask here – artm Dec 17 '16 at 21:08
  • Here are two previous highly rated questions. [The first](http://stackoverflow.com/questions/4337217/difference-between-signed-unsigned-char) and [The second](http://stackoverflow.com/questions/75191/what-is-an-unsigned-char). – Weather Vane Dec 17 '16 at 21:14
  • 1
    "As per my understanding, most significant bit is used to represent the sign of the number." --> C specifies that 1 of 3 ways to represent signed integer includes two's complement (the most common). In this case, the most significant bit represents `-256`. – chux - Reinstate Monica Dec 17 '16 at 21:16
  • Please note that wrapping a `signed` integer value is "implementation defined". Wrapping an `unsigned` integer value *is* defined by the standard. – Weather Vane Dec 17 '16 at 21:17
  • 3
    C standard now allow three different representations of signed integer. Sign-magnitude as you described is one of them, but not a very common one. The most commonly used is called 2's compliment which has a greater range on the negative side. – user3528438 Dec 17 '16 at 21:19
  • 1
    @WeatherVane: Signed integer overflow isn't implementation-defined, it's undefined behavior. – Keith Thompson Dec 17 '16 at 21:21
  • @KeithThompson thank you. – Weather Vane Dec 17 '16 at 21:22
  • 3
    Bit level logic: decimal `127` is binary `01111111`. Adding `1` to that gives binary `10000000`. A positive integer with bit 7 clear, has become a negative integer (2's complement) with bit 7 set, decimal value `-128`. But as said, in C for signed integers that is undefined behaviour, because this is not the only way to represent signed integers. – Weather Vane Dec 17 '16 at 21:33
  • http://stackoverflow.com/a/41008130/7076153 – Stargateur Dec 17 '16 at 21:54
  • 1
    The code invokes undefined behaviour. While the explanation in the other comments show what **might** happen, there is no guarantee. Actually anything can happen, your computer could jump out of the window or you could see nasal daemons. Researching why specific code on a specific implementation for a specific platform shows a particular result is only of academical interest and can change anytime. – too honest for this site Dec 17 '16 at 21:55
  • @WeatherVane: Signed integer overflow is **not** implementation defined, but invokes UB. – too honest for this site Dec 17 '16 at 21:56
  • @Olaf I thanked KeithThompson for pointing that out a long time ago, but thank you for reinforcing the point. Perhaps I will now never forget. – Weather Vane Dec 17 '16 at 22:00
  • @WeatherVane: My appollogies. I honestly seem to have skipped his comment. – too honest for this site Dec 17 '16 at 22:04
  • @Olaf: Signed intreger overflow in arithmetic operations is UB. SIgned integer overflow in conversions to signed types is implementation defined. Since `char` is a small type (i.e. it is promoted to `int` for arithmetics), it is important to carefully observe what is really happening: arithmetic overflow or conversion overflow. – AnT stands with Russia Dec 18 '16 at 05:26
  • On platforms where `signed char` is narrower than `int` the above `x += 1` expression does not invoke indefined behavior, it invokes implementation-defined behavior. Since `x` is narrower than `int` it gets promoted and the expression stands for `x = (signed char) ((int) x + 1)`. The overflow occurs during conversion, not during addition. The ebhavior is implementation defined. – AnT stands with Russia Dec 18 '16 at 05:28
  • @Abhishek Sagar: Why does the question mention *unsigned* types while nothing in it actually has anything to do with unsigned types? – AnT stands with Russia Dec 18 '16 at 05:30
  • @AnT: Not sure what you mean with "overflow in conversions". The conversion per se is implementation defined; typically there is no arithmetic involved which can overflow. Values are truncated and the result is the reverse of the (standard defined) conversion signed to unsigned. And the size of `char` in relateion to `int` is not fixed by the standard. Both can use the exactly same representation (including width). See some 16/24/32 bit DSPs. Btw. trhe conversion for arithmetics is done regardless if `char` is smaller than `int`. – too honest for this site Dec 18 '16 at 05:48
  • @Olaf: By "overflow in conversions" I refer to situations when the original value does not fit into the target type. What "reverse of conversion signed to unsigned" you are talking about is not clear to me. This question does not involve unsigned types in any way, shape or form. – AnT stands with Russia Dec 18 '16 at 05:51
  • "the original value does not fit into the target type" - That's not overflow signed integer overflow occurs only for arithmetic operations and always invokes UB. Whether there is truncation is part of the implementation-defined behaviour. Basically an implementation can e.g. fold the wider type with xor. It is just that on most architectures this costs more code/time, and would make the compiler more complicated. IOW: it just does not make sense. – too honest for this site Dec 18 '16 at 05:52
  • @Olaf: Firstly, if I referred to it as "overflow", then it is overflow. I hereby officially declare it *overflow* just to prevent any pointless and unnecessarily nitpicky discussions. Secondly I'm talking about the OP's situation specifically, where it is already perfectly clear that their `signed char` is narrower than their `int`. – AnT stands with Russia Dec 18 '16 at 05:57
  • @AnT: Sir, sorry, sir! I didn't know **you** define the standard. Just to mention - if you allow me - it was not me to jump onto that train. – too honest for this site Dec 18 '16 at 07:31

2 Answers2

5

This works based on something called Two's Complement. The idea here, is that given some binary number, it's two complement will be it's One Complement (flip all bits) plus one. We can see a simple example, let's find the two's complement of 13, which we can write as 0b01101. 01101 (flip) -> 10010 (+1) --> 10011

Now, although if we interpreted that as a binary number as usual we would read 19 in decimal, we must know that the number is written in Two's Complement in order to reverse the procedure and arrive at the previous number 13. So, from this we can see that we have represented things such that +13 = 01101 and -13 = 10011, notice that the positive number starts with a 0 and it's symmetric with a 1. This will be a constant when using this representation, positive numbers will always begin with a 0, and negative ones with a 1. Something else that is worth noting is that I prefixed a 0 to my original representation of 13, that will be needed in order to correctly represent it's two's complement. You can try going through the same example without doing that and verifying it's necessity.

Now, let's take a look at a few values represented like this,

╔══════╦════════════════╦════════════════════════╗
║ Bits ║ Unsigned Value ║ Two's Complement Value ║
╠══════╬════════════════╬════════════════════════╣
║ 011  ║ 3              ║ 3                      ║
╠══════╬════════════════╬════════════════════════╣
║ 010  ║ 2              ║ 2                      ║
╠══════╬════════════════╬════════════════════════╣
║ 001  ║ 1              ║ 1                      ║
╠══════╬════════════════╬════════════════════════╣
║ 000  ║ 0              ║ 0                      ║
╠══════╬════════════════╬════════════════════════╣
║ 111  ║ 7              ║ -1                     ║
╠══════╬════════════════╬════════════════════════╣
║ 110  ║ 6              ║ -2                     ║
╠══════╬════════════════╬════════════════════════╣
║ 101  ║ 5              ║ -3                     ║
╠══════╬════════════════╬════════════════════════╣
║ 100  ║ 4              ║ -4                     ║
╚══════╩════════════════╩════════════════════════╝

As you can see, it works just as we had previously intended, however you can now begin to understand how the "bug" you found happens. The upper limit for a 4-bit representation in Two's Complement is the decimal value 3. Let's see how we would reach -4 by simply adding 1 to it. 3 = 0b011 therefore 3+1 = 0b100, which as you can see from the table maps to -4 (as opposed to 4) on Two's Complement. Your case was this exact problem, but with more bits. Signed representation like this is circular, so overflowing on the top yields the bottom value. Let's look at your case

127 = 0b01111111
127 + 1 = 0b10000000

As you can see it starts with a 1, therefore it is negative (!) and if you solve the Two's Complement you will see it represents -128 (as the lower bound is always larger than the upper bound).

It's given that not every hardware will implement things in the same way, Intel, AMD, ARM and, as far as I know, all major architectures for general purpose CPUs use Two's complement in their ALUs, but there is hardware that uses other techniques for implementing signing of integers, so fundamentally the behavior you described is undefined. Another interesting thing to notice is that IEEE's standard for floating point arithmetic, implements an exponent bias based signed float.

Finally, since we are talking about C here, do note that undefined behavior can be optimized by the compiler, one great example of such optimizations is described in this blog post.

Bernardo Meurer
  • 2,295
  • 5
  • 31
  • 52
  • 2
    The choice of two's-complement arithmetic is a chip-maker's decision, not usually a programming language decision. That is, the CPU normally determines whether arithmetic is implemented using two's-complement (most common, by far) or one's-complement or sign-magnitude arithmetic. A programming language could emulate any of these modes of arithmetic, but only at a cost. – Jonathan Leffler Dec 17 '16 at 23:17
  • @JonathanLeffler Thanks for the correction, I have made some fixes to my conclusion paragraph, let me know what you think. – Bernardo Meurer Dec 18 '16 at 00:28
  • 1
    The behavior is not undefined. This answer is wrong. No overflow happens. See this correct answer: http://stackoverflow.com/a/41205803/4082723 – 2501 Dec 18 '16 at 08:41
  • @JonathanLeffler: The natural way to perform binary addition, subtraction, and multiplication is to use two's-complement format. Performing those operations in any other format requires taking two's-complement hardware and adding other stuff to it. It's easier for operators to make sense of blinkenlights displays in sign-magnitude format than in other formats, and it's electrically easier to convert ones-complement numbers to sign-magnitude than to convert two's-complement numbers. If one has a machine with 16 registers that have attached blinkenlights, ... – supercat Dec 27 '16 at 22:43
  • ...having a single circuit to perform ones'-complement math and sixteen circuits to show a ones'-complement register as sign-magnitude may be much cheaper than having sixteen circuits to display a two's-complement register as sign-magnitude. Outside of thing like blinkenlight displays, however, sign-magnitude and ones'-complement are inferior to two's-complement in pretty much every way imaginable. – supercat Dec 27 '16 at 22:45
1

In C the behavior of operator += is defined though equivalent combination of = and + operators. E.g. your x += 1 by definition stands for x = x + 1. Since x has narrow type signed char, it is promoted to int before any arithmetic begin. This means that subexpression x + 1 is evaluated in the domain of type int. After that the result (of type int) is converted back to signed char and stored back into x.

Thus, in your case your x += 1 is actually equivalent to

x = (signed char) ((int) x + 1);

The (int) x + 1 subexpression does not overflow. It successfully produces value 128 of type int. However, this value does not fit into the range of signed char type, which leads to implementation-defined behavior when this value is converted back to signed char type. On your platform this implementation-defined behavior produces the value -128 of type signed char.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 1
    It is really disappointing that a correct answer has +0, while an answer that doesn't touch the actual issue and doesn't answer the question has +6. – 2501 Dec 18 '16 at 08:34