122

I have come across code from someone who appears to believe there is a problem subtracting an unsigned integer from another integer of the same type when the result would be negative. So that code like this would be incorrect even if it happens to work on most architectures.

unsigned int To, Tf;

To = getcounter();
while (1) {
    Tf = getcounter();
    if ((Tf-To) >= TIME_LIMIT) {
        break;
    } 
}

This is the only vaguely relevant quote from the C standard I could find.

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

I suppose one could take that quote to mean that when the right operand is larger the operation is adjusted to be meaningful in the context of modulo truncated numbers.

i.e.

0x0000 - 0x0001 == 0x 1 0000 - 0x0001 == 0xFFFF

as opposed to using the implementation dependent signed semantics:

0x0000 - 0x0001 == (unsigned)(0 + -1) == (0xFFFF but also 0xFFFE or 0x8001)

Which or what interpretation is right? Is it defined at all?

LihO
  • 41,190
  • 11
  • 99
  • 167
  • 5
    The choice of words in the standard is unfortunate. That it “can never overflow” means that it is not an error situation. Using the terminology in the standard, instead of overflowing the value “wraps.” – danorton Aug 28 '11 at 14:26

6 Answers6

145

When you work with unsigned types, modular arithmetic (also known as "wrap around" behavior) is taking place. To understand this modular arithmetic, just have a look at these clocks:

enter image description here

9 + 4 = 1 (13 mod 12), so to the other direction it is: 1 - 4 = 9 (-3 mod 12). The same principle is applied while working with unsigned types. If the result type is unsigned, then modular arithmetic takes place.


Now look at the following operations storing the result as an unsigned int:

unsigned int five = 5, seven = 7;
unsigned int a = five - seven;      // a = (-2 % 2^32) = 4294967294 

int one = 1, six = 6;
unsigned int b = one - six;         // b = (-5 % 2^32) = 4294967291

When you want to make sure that the result is signed, then stored it into signed variable or cast it to signed. When you want to get the difference between numbers and make sure that the modular arithmetic will not be applied, then you should consider using abs() function defined in stdlib.h:

int c = five - seven;       // c = -2
int d = abs(five - seven);  // d =  2

Be very careful, especially while writing conditions, because:

if (abs(five - seven) < seven)  // = if (2 < 7)
    // ...

if (five - seven < -1)          // = if (-2 < -1)
    // ...

if (one - six < 1)              // = if (-5 < 1)
    // ...

if ((int)(five - seven) < 1)    // = if (-2 < 1)
    // ...

but

if (five - seven < 1)   // = if ((unsigned int)-2 < 1) = if (4294967294 < 1)
    // ...

if (one - six < five)   // = if ((unsigned int)-5 < 5) = if (4294967291 < 5)
    // ...
LihO
  • 41,190
  • 11
  • 99
  • 167
  • 9
    The line `int d = abs(five - seven);` is no good. First `five - seven` is computed: promotion leaves the operand types as `unsigned int`, the result is computed modulo `(UINT_MAX+1)`, and evaluates to `UINT_MAX-1`. Then this value is the actual parameter to `abs`, which is bad news. `abs(int)` causes undefined behavior passing the argument, since it isn't in range, and `abs(long long)` can probably hold the value, but undefined behavior occurs when the return value is coerced to `int` to initialize `d`. – Ben Voigt Oct 14 '15 at 02:24
  • 1
    The line `int c = five - seven;` is wrong more immediately and for the same reason. – Ben Voigt Oct 14 '15 at 02:24
  • @BenVoigt: As per standard, modular arithmetic takes place when the result type is unsigned right? Both of the `abs()` and `int c =` expect signed type, so implicit conversion should ensure that what you've described shouldn't happen, shouldn't it? – LihO Oct 14 '15 at 03:16
  • 1
    @LihO: The only operator in C++ that is context-sensitive and acts differently depending on how its result is used is a custom conversion operator `operator T()`. The addition in the two expressions we are discussing is performed in type `unsigned int`, based on the operand types. The result of the addition is `unsigned int`. Then that result is implicitly converted to the type required in context, a conversion which fails because the value is not representable in the new type. – Ben Voigt Oct 14 '15 at 03:20
  • 1
    @LihO: It may help to think of `double x = 2/3;` vs `double y = 2.0/3;` – Ben Voigt Oct 14 '15 at 03:21
  • 1
    Hmmm, it may not be undefined behavior, because this is conversion of an out-of-range value to a signed integral type, not overflow during calculation. So it would be implementation-defined instead. – Ben Voigt Oct 14 '15 at 03:27
  • 1
    The use of `abs()` that you suggest here, which (if I'm reading it correctly) reinterprets an unsigned value as signed, and takes its absolute value, seems likely to be fragile and non-portable. I'm a bit surprised to see this written so casually without some disclaimer. – Theodore Murdock Jun 23 '16 at 21:38
  • In the case of unsigned long var1=4294967299*2; we get the answer as 6. How compiler handles this? Whether 4294967299 is first truncated to 3 and then 2 is multiplied or multiplication happens first and then truncation happens (using modulo) Basically I want to know if multiplication is first, compiler must have larger data type temporarily. – Rajesh Jan 15 '18 at 17:09
128

The result of a subtraction generating a negative number in an unsigned type is well-defined:

  1. [...] A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type. (ISO/IEC 9899:1999 (E) §6.2.5/9)

As you can see, (unsigned)0 - (unsigned)1 equals -1 modulo UINT_MAX+1, or in other words, UINT_MAX.

Note that although it does say "A computation involving unsigned operands can never overflow", which might lead you to believe that it applies only for exceeding the upper limit, this is presented as a motivation for the actual binding part of the sentence: "a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." This phrase is not restricted to overflow of the upper bound of the type, and applies equally to values too low to be represented.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • 3
    Thank you! I now see the interpretation I was missing. I think they could have chosen a clearer wording though. –  Aug 28 '11 at 14:25
  • 4
    I feel so much better now, knowing that if any unsigned addition rolls around to zero and causes mayhem, it will be because `uint` was always intended to represent the mathematical [ring](https://en.wikipedia.org/wiki/Ring_(mathematics)) of the integers `0` through `UINT_MAX`, with the operations of addition and multiplication modulo `UINT_MAX+1`, and not because of an overflow. It does, however, beg the question of why, if rings are such a fundamental data type, the language doesn't offer more general support for rings of other sizes. – Theodore Murdock Jun 28 '16 at 20:30
  • 3
    @TheodoreMurdock I think the answer to that question is simple. As far as I can tell, the fact that it's a ring is a consequence, not a cause. The real requirement is that unsigned types must have all of their bits participating in the value representation. Ring-like behaviour flows naturally from that. If you want such behaviour from other types, then do your arithmetic followed by applying the required modulus; that uses fundamental operators. – underscore_d May 24 '17 at 22:36
  • 1
    @underscore_d Of course...it's clear why they made the design decision. It's just amusing that they wrote the spec roughly as "there is no arithmetic over/underflow because the data type is spec'd as a ring", as if this design choice meant that programmers don't have to carefully avoid over- and under-flow or have their programs fail spectacularly. – Theodore Murdock May 26 '17 at 20:17
  • 1
    @underscore_d The same argument (that all bits participate in the value representation) could be made for signed ints as well; the special property of unsigneds that allows the standard to specify the modulo arithmetic is that equal bit sizes in unsigneds represent equal value ranges across implementations. Historically, that was not the case for signed ints (not all were two's complements). Now that they are, we could define similar rules for signeds. – Peter - Reinstate Monica Jan 26 '23 at 16:54
5

Well, the first interpretation is correct. However, your reasoning about the "signed semantics" in this context is wrong.

Again, your first interpretation is correct. Unsigned arithmetic follow the rules of modulo arithmetic, meaning that 0x0000 - 0x0001 evaluates to 0xFFFF for 32-bit unsigned types.

However, the second interpretation (the one based on "signed semantics") is also required to produce the same result. I.e. even if you evaluate 0 - 1 in the domain of signed type and obtain -1 as the intermediate result, this -1 is still required to produce 0xFFFF when later it gets converted to unsigned type. Even if some platform uses an exotic representation for signed integers (1's complement, signed magnitude), this platform is still required to apply rules of modulo arithmetic when converting signed integer values to unsigned ones.

For example, this evaluation

signed int a = 0, b = 1;
unsigned int c = a - b;

is still guaranteed to produce UINT_MAX in c, even if the platform is using an exotic representation for signed integers.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
5

With unsigned numbers of type unsigned int or larger, in the absence of type conversions, a-b is defined as yielding the unsigned number which, when added to b, will yield a. Conversion of a negative number to unsigned is defined as yielding the number which, when added to the sign-reversed original number, will yield zero (so converting -5 to unsigned will yield a value which, when added to 5, will yield zero).

Note that unsigned numbers smaller than unsigned int may get promoted to type int before the subtraction, the behavior of a-b will depend upon the size of int.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

Well, an unsigned integer subtraction has defined behavior, also it is a tricky thing. When you subtract two unsigned integers, result is promoted to higher type int if result (lvalue) type is not specified explicitly. In the latter case, for example, int8_t result = a - b; (where a and b have int8_t type) you can obtain very weird behavior. I mean you may loss transitivity property (i.e. if a > b and b > c it is true that a > c). The loss of transitivity can destroy a tree-type data structure work. Care must be taken not to provide comparison function for sorting, searching, tree building that uses unsigned integer subtraction to deduce which key is higher or lower.

See example below.

#include <stdint.h>
#include <stdio.h>

void main()
{
    uint8_t a = 255;
    uint8_t b = 100;
    uint8_t c = 150;

    printf("uint8_t a = %+d, b = %+d, c = %+d\n\n", a, b, c);

    printf("          b - a  = %+d\tpromotion to int type\n"
           " (int8_t)(b - a) = %+d\n\n"
           "          b + a  = %+d\tpromotion to int type\n"
           "(uint8_t)(b + a) = %+d\tmodular arithmetic\n"
           "     b + a %% %d = %+d\n\n", 
           b - a,  (int8_t)(b - a), 
           b + a, (uint8_t)(b + a),
           UINT8_MAX + 1,
           (b + a) % (UINT8_MAX + 1));

    printf("c %s b (b - c = %d), b %s a (b - a = %d), AND c %s a (c - a = %d)\n",
           (int8_t)(c - b) < 0 ? "<" : ">", (int8_t)(c - b),
           (int8_t)(b - a) < 0 ? "<" : ">", (int8_t)(b - a),
           (int8_t)(c - a) < 0 ? "<" : ">", (int8_t)(c - a));
}
$ ./a.out 
uint8_t a = +255, b = +100, c = +150

          b - a  = -155 promotion to int type
 (int8_t)(b - a) = +101

          b + a  = +355 promotion to int type
(uint8_t)(b + a) = +99  modular arithmetic
     b + a % 256 = +99

c > b (b - c = 50), b > a (b - a = 101), AND c < a (c - a = -105)
0
int d = abs(five - seven);  // d =  2

std::abs is not "suitable" for unsigned integers. A cast is needed though.

Gerhard
  • 6,850
  • 8
  • 51
  • 81
mr_int
  • 1