19

I read a bit in C spec that unsigned variables(in particular unsigned short int) perform some so called wrap around on integer overflow, although I couldn't find anything on signed variables except that I left with undefined behavior.

My professor told me that their values also get wrapped around (maybe he just meant gcc). I thought the bits just get truncated and the bits I left with give me some weird value!

What wrap around is and how is it different from just truncating bits.

double-beep
  • 5,031
  • 17
  • 33
  • 41
orustammanapov
  • 1,792
  • 5
  • 25
  • 44
  • 4
    Your professor is wrong unless his statement that values wrap around was qualified to specific C implementations, possibly with specific flags regarding optimization or semantics. – Eric Postpischil Nov 07 '13 at 17:31
  • @EricPostpischil what do you mean specific c implementation and how is it in general? – orustammanapov Nov 07 '13 at 17:47
  • 6
    The C standard does not define what happens when integer overflow occurs. However, a C implementation may define what happens. C implementations are permitted to go beyond the C standard. So it is possible, for example, that GCC could issue a statement saying “When integer overflow occurs and the `-fSomething` switch is used when compiling, then the result is wrapped in two’s complement.” Or, if GCC does not state this, then Fred Doe can check out the GCC sources, modify them as desired, and issue a new Fred-specific version of GCC that behaves in a particular way. – Eric Postpischil Nov 07 '13 at 18:03

4 Answers4

29

Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.

For example, GCC compiler can assume that for variable int i the following condition

if (i > 0 && i + 1 > 0)

is equivalent to a mere

if (i > 0)

This is exactly what strict overflow semantics means.

Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.

However, C language never performs arithmetic computations in domains smaller than that of int/unsigned int. Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int). Which means that 1) the computations with unsigned short int will be preformed in the domain of int, with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.

For example, this code produces a wrap around

unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */

while this code

unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */

leads to undefined behavior.

If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N, which will appear as if the value has wrapped around.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • what does "unsigned i" stand for? You've explained it very well, could you show it on this example: `short int y = 511, z = 512; y*=z;` – orustammanapov Nov 07 '13 at 20:57
  • 6
    @orustammanapov: `unsigned i` stands for `unsigned int i`, just like `long i` stands for `long int i`. `int` is implied in such contexts. As for your example, `y*=z` is interpreted as `y = (short) ((int) y * (int) z)`, which boils down to `y = (short) 261632`. Note that in this case there's no overflow during arithmetic evaluations (which fit into `int` perfectly well), but there is overflow during conversion back to `short`. The behavior in this case is *implementation-dependent*. – AnT stands with Russia Nov 07 '13 at 21:05
  • 1
    I wonder how much real-world advantage there is to making integer overflow unconstrained UB rather than saying that compilers may include extra precision on computations involving integer types, may arbitrarily rewrite any or all excess-precision bits any time they don't all agree with the sign bit, and may at their leisure store such extra precision in non-volatile integer variables? Such rules would allow a compiler to regard `i+1 > i` as being unconditionally true, but would not allow compilers to negate causality. How much advantage comes from going beyond that? – supercat May 14 '15 at 19:45
12

Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).

This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.

Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -0
101  = -1
110  = -2
111  = -3

Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.

One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -3
101  = -2
110  = -1 
111  = -0

I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.

Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:

000  =  0
001  =  1
010  =  2
011  =  3
100  = -4
101  = -3
110  = -2
111  = -1

If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.

Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.

stkent
  • 19,772
  • 14
  • 85
  • 111
John Bode
  • 119,563
  • 19
  • 122
  • 198
  • 7
    The different signed representations were the original reason for leaving signed arithmetic overflow undefined. Now, compilers have started taking advantage of undefined behavior for optimization, so it is not safe to rely on signed overflow producing two's complement results even if you know that you target a two's complement architecture. See e.g. http://www.airs.com/blog/archives/120 – Pascal Cuoq Nov 07 '13 at 17:58
5

The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement.

Nowadays, all architectures represent integers as two's complement that do wrap around. But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on.

diapir
  • 2,872
  • 1
  • 19
  • 26
  • does it mean that i get a wrap around even with signed integers? – orustammanapov Nov 07 '13 at 17:46
  • 3
    @orustammanapov: No. Even if the underlying hardware has some intrinsic behavior for how overflows are handled or what sort of wrapping occurs, the C implementation is not required to use it. The C standard says behavior upon signed integer overflow is undefined. This means that the C implementation may optimize code without regard to what happens in case of overflow. This might cause behaviors to occur as if values wrapped, or it might cause behaviors to occur as if a trap occurred or it can produce different values than wrapping might produce. – Eric Postpischil Nov 07 '13 at 17:54
3

In a signed 8-bit integer, the intuitive definition of wrap around might look like going from +127 to -128 -- in two's complement binary: 0111111 (127) and 1000000 (-128). As you can see, that is the natural progress of incrementing the binary data--without considering it to represent an integer, signed or unsigned. Counter intuitively, the actual overflow takes place when moving from -1 (11111111) to 0 (00000000) in the unsigned integer's sense of wrap-around.

This doesn't answer the deeper question of what the correct behavior is when a signed integer overflows because there is no "correct" behavior according to the standard.