266

Unsigned integer overflow is well defined by both the C and C++ standards. For example, the C99 standard (§6.2.5/9) states

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

However, both standards state that signed integer overflow is undefined behavior. Again, from the C99 standard (§3.4.3/1)

An example of undefined behavior is the behavior on integer overflow

Is there an historical or (even better!) a technical reason for this discrepancy?

Lundin
  • 195,001
  • 40
  • 254
  • 396
anthonyvd
  • 7,329
  • 4
  • 30
  • 51
  • 67
    Probably because there is more than one way of representing signed integers. Which way is not specified in the standard, at least not in C++. – juanchopanza Aug 12 '13 at 20:06
  • Useful link: http://en.wikipedia.org/wiki/Signed_number_representations – Robᵩ Aug 12 '13 at 20:07
  • 7
    What juanchopanza said makes sense. As I understand it, the original C standard in a large part codified existing practice. If all implementations at that time agreed on what unsigned "overflow" should do, that's a good reason for getting it standardized. They didn't agree on what signed overflow should do, so that did not get in the standard. –  Aug 12 '13 at 20:07
  • It may be because signed integer overflow is easily detectable by checking the most sig. bit before and after. Much more difficult to detect unsigned overflow. – David Elliman Aug 12 '13 at 20:08
  • 2
    @DavidElliman Unsigned wraparound on addition is easily detectable (`if (a + b < a)`) too. Overflow on multiplication is hard for both signed and unsigned types. –  Aug 12 '13 at 20:10
  • 5
    @DavidElliman: It is not only an issue of whether you can detect it, but what the result is. In a sign + value implementation, `MAX_INT+1 == -0`, while on a two's complement it would be `INT_MIN` – David Rodríguez - dribeas Aug 12 '13 at 20:11
  • C2X proposes mandating 2's complement, allowing the simplification of integer types. https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf – qwr Aug 16 '23 at 18:55

7 Answers7

204

The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.

In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).

Relevant quotes:

C99 6.2.6.1:3:

Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.

C99 6.2.6.2:2:

If the sign bit is one, the value shall be modified in one of the following ways:

— the corresponding value with sign bit 0 is negated (sign and magnitude);

— the sign bit has the value −(2N) (two’s complement);

— the sign bit has the value −(2N − 1) (one’s complement).


Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.

Community
  • 1
  • 1
Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 9
    The important note here, though, is that there remain **no** architectures in the modern world using anything other than 2's complement signed arithmetic. That the language standards still allow for implementation on e.g. a PDP-1 is a pure historical artifact. – Andy Ross Aug 12 '13 at 20:12
  • @AndyRoss Ah, yes, you remind me that I need to add a note to my answer. – Pascal Cuoq Aug 12 '13 at 20:13
  • Thanks for the great answer, the blog post about GCC relying on signed integer overflow being undefined behavior to optimize was the icing on the cake. – anthonyvd Aug 12 '13 at 20:20
  • 11
    @AndyRoss but there are still systems (OS + compilers, admittedly with an old history) with one's complement and new releases as of 2013. An example: OS 2200. – ouah Aug 12 '13 at 20:26
  • 5
    @Andy Ross would you consider "no architectures ... using anything other than 2's complement ..." today includes the gamut of DSPs and embedded processors? – chux - Reinstate Monica Aug 12 '13 at 20:27
  • 13
    @AndyRoss: While there are “no” architectures using anything other than 2s complement (for some definition of “no”), there *definitely are* DSP architectures that use saturating arithmetic for signed integers. – Stephen Canon Aug 12 '13 at 20:33
  • Touche on saturation, but AFAIK none of those modes are compliant with ISO C either. You use the saturating instructions via assembly or intrinsics. Is there a system out there which represents a ISO C signed char/short/int/long variable with saturation? – Andy Ross Aug 12 '13 at 20:36
  • 12
    Saturating signed arithmetic is definitely compliant with the standard. Of course the wrapping instructions must be used for unsigned arithmetic, but the compiler always has the information to know whether unsigned or signed arithmetic is being done, so it can certainly choose the instructions appropriately. – caf Aug 13 '13 at 01:38
  • The link to Ian Lance Taylor's blog post is dead, by the way; if you know of an updated link, it'd be worth updating this answer. Anyway, great answer, +1. – KRyan Jun 14 '15 at 01:05
  • @KRyan The link still works for me, but if it came to disappear, this following one should continue to work: https://web.archive.org/web/20141004000724/http://www.airs.com/blog/archives/120 – Pascal Cuoq Jun 14 '15 at 08:05
  • 2
    @KRyan Much more recently, I have discovered this optimization guide by Nvidia for CUDA (which is not C++ but is derived from it) that recommends signed loop counters when possible because the undefined behavior transmits information to the compiler (that the loop doesn't wrap around). This one will probably go bad someday: http://docs.nvidia.com/cuda/cuda-c-best-practices-guide/#loop-counters-signed-vs-unsigned – Pascal Cuoq Jun 14 '15 at 08:20
  • @PascalCuoq Right you are; it's working now. Must have been a blip. That CUDA article is fascinating though; completely reverses my intuition on that point. Would you happen to know if the same is true in C++? Don't do a lot of C coding these days. I know the standards on this point are the same, but unsure about what behavior can be expected from compilers. – KRyan Jun 14 '15 at 13:40
  • 1
    @PascalCuoq: I took the liberty of adding an explanation why the integer representation determines overflow behaviour. It is probably fairly obvious, but may not be clear for someone without a background in C. Feel free to re-edit :-). – sleske Mar 14 '16 at 10:42
  • @PascalCuoq: The C89 authors noted in the rationale that most implementations would (consistently) treat signed overflow as wrapping in cases where the result is coerced to unsigned, and I think they expected that would continue. Do you see any advantages in non-contrived scenarios to having implementations for silent-wraparound hardware do anything else? – supercat Oct 31 '16 at 15:00
  • 1
    But what about fixed width integers? They must use 2's complement notation. Why don't they have wrapped overflow? – bzim Feb 17 '18 at 16:15
  • @BrunoCorrêaZimmermann This is an interesting question.I would bet this is because compiler authors want both a) to be able to simply `typedef` these types to `int`, `long` and `long long` as appropriate on modern architectures where 2s complement and usual widths are used and b) keep the optimizations that they have implemented for these types and the favorable benchmark results that they bring. But this is only speculation on my part. – Pascal Cuoq Feb 17 '18 at 20:46
  • 2
    @BrunoCorrêaZimmermann: Being able to have a compiler turn things like `x<>z` into either `x<<(y-z)` or `x>>(z-y)`, in cases where it knows y and z, is generally safe and useful. Such optimizations could be accommodated without requiring that overflow invoke UB if the Standard allowed implementations to choose in Unspecified fashion among alternative defined behaviors which would include wrapping to any convenient size that's at *least* as large as the integer type, but I've not seen much interest in formalizing such a thing. – supercat Apr 05 '18 at 21:48
  • 1
    This makes sense, but why wouldn't the standard say that this behavior is unspecified or implementation defined? – Mikhail Mar 23 '19 at 15:34
  • 2
    @Mikhail: On platforms where integer overflow would raise a signal, treating it as Implementation Defined would severely limit compilers' ability to reorder or consolidate integer operations that could overflow. If, for example, code within a loop multiplies two numbers, and those numbers never change within that loop, making overflow Implementation Defined would require that the aforementioned signal not be raised until code reaches the part of the loop where the computation takes place; absent that requirement, a compiler could perform the multiply once, before the start of the loop. – supercat Mar 17 '20 at 21:09
  • @Mikhail: If code within the loop calls a function that modifies an object of type `sig_atomic_t`, that is examined within the signal handler, it might be important that the overflow happens at the proper spot in the execution sequence, but most of the time it wouldn't matter. Compiler writers should be better able to judge their customers' needs regarding such issues than the Committee ever could. – supercat Mar 17 '20 at 21:11
  • 1
    The concerns that you mention in your historical argument are still no reason to call it undefined behavior. Rather, it should be implementation-defined behavior, because that's what it is. There has never been an implementation of integers that didn't implement some sort of controlled overflow behavior. – shuhalo Sep 14 '21 at 21:00
  • @shuhalo I think that your concern about the first part of my answer are addressed adequately by the second part of my answer, which starts, “ Nowadays, […] but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization”. Please note that I am not a member of the C standardization committee and that remarks about what should or shouldn't be undefined in C are best sent to them, not me. – Pascal Cuoq Sep 25 '21 at 22:28
  • 1
    @shuhalo: According to the published Rationale document, the authors of the Standard expected that most implementations would, as a form of "conforming language extension", define the behavior of signed integer overflow in at least some cases where the Standard imposes no requirements. If anyone in 1989 could have predicted that compilers would use the Standard's use of the phrase "Undefined Behavior" to regard as broken constructs which 90%+ of compilers would process in 100% predictable fashion, the C89 Standard would have been soundly rejected. – supercat Nov 08 '21 at 20:02
  • For completeness, you could add that wrapping on overflow in signed integers is also wanted for faster code. Take detection of signed overflow in addition and subtraction as example, where there are faster and simpler algorithms with 1 branch check (Hackers Delight). – Jay-Pi Dec 30 '21 at 16:49
  • @Jay-Pi: Requiring compilers to use precise wraparound semantics would significantly impede otherwise-useful optimizations in many cases; having programmers who want wraparound behavior explicitly include a cast to the signed type in question (e.g. `if ((int)(x+y) < 0)` would often make the purpose of code relying upon such code clearer to any humans reading it. The worst-of-all-possible worlds scenario, however, is what gcc uses, forcing programmers to prevent overflow at all costs even in cases where all numeric results would be equally acceptable when a program gets invalid data. – supercat Nov 21 '22 at 19:25
  • @AndyRoss: last time I checked, Cisco still makes ones compliment machines. – Joshua Feb 11 '23 at 03:30
19

Aside from Pascal's good answer (which I'm sure is the main motivation), it is also possible that some processors cause an exception on signed integer overflow, which of course would cause problems if the compiler had to "arrange for another behaviour" (e.g. use extra instructions to check for potential overflow and calculate differently in that case).

It is also worth noting that "undefined behaviour" doesn't mean "doesn't work". It means that the implementation is allowed to do whatever it likes in that situation. This includes doing "the right thing" as well as "calling the police" or "crashing". Most compilers, when possible, will choose "do the right thing", assuming that is relatively easy to define (in this case, it is). However, if you are having overflows in the calculations, it is important to understand what that actually results in, and that the compiler MAY do something other than what you expect (and that this may very depending on compiler version, optimisation settings, etc).

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 32
    Compilers do not want you to rely on them doing the right thing, though, and most of them will show you so as soon as you compile `int f(int x) { return x+1>x; }` with optimization. GCC and ICC do, with default options, optimize the above to `return 1;`. – Pascal Cuoq Aug 12 '13 at 20:18
  • 1
    For an example program that gives different results when faced with `int` overflow depending on optimization levels, see http://ideone.com/cki8nM I think this demonstrates that your answer gives bad advice. – Magnus Hoff Aug 12 '13 at 20:19
  • I have amended that part a bit. – Mats Petersson Aug 12 '13 at 20:29
  • If a C were to provide a means of declaring a "wrapping signed two's complement" integer, no platform that can run C at all should have much trouble supporting it at least moderately efficiently. The extra overhead would be sufficient that code shouldn't use such a type when wrapping behavior isn't required, but most operations on two's complement integers are identical to those on an unsigned integers, except for comparisons and promotions. – supercat Feb 10 '14 at 21:09
  • @supercat: You are assuming that the system uses two's complement. What about one's complement, or "sign bit and regular number" (e.g. in 8-bit: 1 = 00000001, -1 = 10000001, 63 = 00111111, -63 = 10111111 - this is how floating point numbers are stores, so it's not beyond the realm of possibility to have this in a integer form too). – Mats Petersson Feb 11 '14 at 08:25
  • @MatsPetersson: If a type were specified as a two's-complement 16-bit integer on hardware with no support for such, the compiler should allocate a 16-bit unsigned integer. `x < y` should be replaced with `(x^0x8000) < (y^0x8000)`, `(uint32)x` should be replaced with `x-((uint32_t)(x&0x8000)<<1);`, etc. It wouldn't matter if the hardware uses one's-complement or sign+magnitude for signed arithmetic, because *the hardware wouldn't be doing any signed arithmetic*. For code which doesn't care about precise two's-complement semantics... – supercat Feb 11 '14 at 14:55
  • ...it would likely be faster to use the hardware's native signed format. For code which does need two's-complement behavior, however, being able to specify it would be much cleaner than having to fake it in software, and would yield much better performance on platforms which happen to use two's-complement natively. – supercat Feb 11 '14 at 14:57
  • Yes, I'm talking about using hardware that has a different integer format than two's complement. I know for a fact that there is hardware that has one's complement (no, it's not very common, but it does exist in the real world). I'm not sure if there is any hardware with "sign bit only". – Mats Petersson Feb 12 '14 at 08:17
  • @MatsPetersson: There exist C compilers for processors with no concept of signed arithmetic; so far as I can such compilers invariably use two's-complement math. I know of nothing that would preclude a C compiler for any platform that had unsigned-arithmetic instructions from behaving as though the platform didn't have any signed arithmetic instructions and using only unsigned ones for all math including signed arithmetic; such code might not be as fast as code using native signed arithmetic instructions, but if wrapping two's-complement arithmetic is semantically required... – supercat May 02 '15 at 17:38
  • @MatsPetersson: ...having a means of specifying it in code and instructing compilers to do what was necessary to achieve it would be cleaner than requiring that code wanting to compare two 16-bit two's-complement integers `i` and `j` must be written `i+0x8000u < j+0x8000u`. I would think it would be easier, given `swrap16_t i,j`, to have `i – supercat May 02 '15 at 17:43
  • ...instructions will recognize that the intention of the code is to perform a two's-complement comparison and avoid the extra 'add' instructions. – supercat May 02 '15 at 17:43
  • 1
    Negative values need to exist and "work" for the compiler to work correctly, It is of course entirely possible to work around the lack of signed values within a processor, and use unsigned values, either as ones complement or twos complement, whichever makes most sense based on what the instruction set is. It would typically be significantly slower to do this than having hardware support for it, but it's no different from processors that doesn't support floating point in hardware, or similar - it just adds a lot of extra code. – Mats Petersson May 03 '15 at 22:04
  • @MatsPetersson: The only reasons I can see for an implementation to use anything other than two's-complement representations are (1) compatibility with arcane implementations, (2) some other form of signed math is faster than unsigned math, or (3) using some other form could allow INT_MAX > UINT_MAX/2. Unless one of those conditions applies, using unsigned types to "emulate" two's-complement signed math would be trivial. – supercat May 23 '17 at 21:32
  • @supercat: Or the hardware is actually ancient in itself and uses that math... I'm not arguing for anyone actually USING that, just the fact that this does exist in some HW (or has existed at least) and that's why there is UB in the spec for this particular scenario. – Mats Petersson May 24 '17 at 05:05
  • @MatsPetersson: If hardware can support full-width unsigned math as fast as whatever signed format it uses, then it will support most operations in two's-complement math just as effectively as whatever format it would use (since most operations on unsigned math and silent-wraparound two's-complement are the same at the bit level). So far as I know, the combined number of C99 and C11 implementations that use something other than two's-complement representations for signed values stands at zero. I know of a ones'-comp C89 implementation that has been maintained into the 21st Century, but... – supercat May 24 '17 at 14:38
  • ...its longest unsigned type is 36 bits, so it's not C99 conforming. I also have a sneaking suspicion that its "71-bit" signed type doesn't actually use a binary representation. If the instruction set makes unsigned math that wraps mod 2^36 slower than math which wraps mod 2^36-1 (the documentation for the implementation suggests that the mode to force conforming unsigned behavior makes things run slower), I would expect that the most efficient way to support a big signed type would be to represent numbers base (2^36-1). – supercat May 24 '17 at 14:42
12

First of all, please note that C11 3.4.3, like all examples and foot notes, is not normative text and therefore not relevant to cite!

The relevant text that states that overflow of integers and floats is undefined behavior is this:

C11 6.5/5

If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined.

A clarification regarding the behavior of unsigned integer types specifically can be found here:

C11 6.2.5/9

The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same. A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

This makes unsigned integer types a special case.

Also note that there is an exception if any type is converted to a signed type and the old value can no longer be represented. The behavior is then merely implementation-defined, although a signal may be raised.

C11 6.3.1.3

6.3.1.3 Signed and unsigned integers

When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

Lundin
  • 195,001
  • 40
  • 254
  • 396
7

In addition to the other issues mentioned, having unsigned math wrap makes the unsigned integer types behave as abstract algebraic groups (meaning that, among other things, for any pair of values X and Y, there will exist some other value Z such that X+Z will, if properly cast, equal Y and Y-Z will, if properly cast, equal X). If unsigned values were merely storage-location types and not intermediate-expression types (e.g. if there were no unsigned equivalent of the largest integer type, and arithmetic operations on unsigned types behaved as though they were first converted them to larger signed types, then there wouldn't be as much need for defined wrapping behavior, but it's difficult to do calculations in a type which doesn't have e.g. an additive inverse.

This helps in situations where wrap-around behavior is actually useful - for example with TCP sequence numbers or certain algorithms, such as hash calculation. It may also help in situations where it's necessary to detect overflow, since performing calculations and checking whether they overflowed is often easier than checking in advance whether they would overflow, especially if the calculations involve the largest available integer type.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • I don't quite follow - why does it help to have an additive inverse? I can't really think of any situation where the overflow behaviour is actually useful... – sleske Mar 14 '16 at 11:02
  • @sleske: Using decimal for human-readability, if an energy meter reads 0003 and the previous reading was 9995, does that mean that -9992 units of energy were used, or that 0008 units of energy were used? Having 0003-9995 yield 0008 makes it easy to calculate the latter result. Having it yield -9992 would make it a little more awkward. Not being able to have it do either, however, would make it necessary to compare 0003 to 9995, notice that it's less, do the reverse subtraction, subtract that result from 9999, and add 1. – supercat Mar 14 '16 at 15:00
  • @sleske: It's also very useful for both humans and compilers to be able to apply the associative, distributive, and commutative laws of arithmetic to rewrite expressions and simplify them; for example, if the expression `a+b-c` is computed within a loop, but `b` and `c` are constant within that loop, it may be helpful to move computation of `(b-c)` outside the loop, but doing that would require among other things that `(b-c)` yield a value which, when added to `a`, will yield `a+b-c`, which in turn requires that `c` have an additive inverse. – supercat Mar 14 '16 at 15:09
  • :Thanks for the explanations. If I understand it correctly, your examples all assume that you actually want to handle the overflow. In most cases I have encountered, the overflow is undesirable, and you want to prevent it, because the result of a calculation with overflow is not useful. For example, for the energy meter you probably want to use a type such that overflow never occurs. – sleske Mar 14 '16 at 15:27
  • @sleske: For things like TCP sequence numbers, it's most useful to have a type which wraps just as in the energy-meter example, so code doesn't need to treat the case where a sequence number goes from 0xFFFFFFFC to 0x00000007 any differently from the case where it goes from 0x00000002 to 0x0000000D. As for the example with a, b, and c, even if one doesn't care about any cases where `a+b` or `(a+b)-c` would be outside the range of the type one is using, one may very often care about cases where `(b-c)` is outside the range of that type. If the type obeys the laws of arithmetic, however, ... – supercat Mar 14 '16 at 15:33
  • 1
    ...such that `(a+b)-c` equals `a+(b-c)` whether or not the arithmetic value of `b-c` is representable within the type, the substitution will be valid regardless of the possible range of values for `(b-c)`. – supercat Mar 14 '16 at 15:36
  • Thanks for the explanations. I tried editing it into your answer - feel free to correct. – sleske Mar 15 '16 at 09:21
3

Perhaps another reason for why unsigned arithmetic is defined is because unsigned numbers form integers modulo 2^n, where n is the width of the unsigned number. Unsigned numbers are simply integers represented using binary digits instead of decimal digits. Performing the standard operations in a modulus system is well understood.

The OP's quote refers to this fact, but also highlights the fact that there is only one, unambiguous, logical way to represent unsigned integers in binary. By contrast, Signed numbers are most often represented using two's complement but other choices are possible as described in the standard (section 6.2.6.2).

Two's complement representation allows certain operations to make more sense in binary format. E.g., incrementing negative numbers is the same that for positive numbers (expect under overflow conditions). Some operations at the machine level can be the same for signed and unsigned numbers. However, when interpreting the result of those operations, some cases don't make sense - positive and negative overflow. Furthermore, the overflow results differ depending on the underlying signed representation.

bjarchi
  • 180
  • 1
  • 8
yth
  • 103
  • 1
  • 7
  • For a structure to be a field, every element of the structure other than the additive identity must have a multiplicative inverse. A structure of integers congruent mod N will be a field only when N is one or prime [a degnerate field when N==1]. Is there anything you feel I missed in my answer? – supercat Oct 04 '17 at 21:16
  • You are right. I got confused by the prime power moduli. Original response edited. – yth Oct 06 '17 at 00:58
  • Extra confusing here is that there *is* a field of order 2^n, it is just not ring-isomorphic to the integers modulo 2^n. – Kevin Ventullo Jun 19 '18 at 15:49
  • And, 2^31-1 is a Mersenne Prime (but 2^63-1 is not prime). Thus, my original idea was ruined. Also, integer sizes were different back in the day. So, my idea was revisionist at best. – yth Jun 22 '18 at 02:53
  • The fact that unsigned integers form a ring (not a field), taking the low-order portion also yields a ring, and performing operations on the whole value and then truncating will behave equivalent to performing the operations on just the lower portion, were IMHO almost certainly considerations. – supercat Mar 17 '20 at 21:15
0

The most technical reason of all, is simply that trying to capture overflow in an unsigned integer requires more moving parts from you (exception handling) and the processor (exception throwing).

C and C++ won't make you pay for that unless you ask for it by using a signed integer. This isn't a hard-fast rule, as you'll see near the end, but just how they proceed for unsigned integers. In my opinion, this makes signed integers the odd-one out, not unsigned, but it's fine they offer this fundamental difference as the programmer can still perform well-defined signed operations with overflow. But to do so, you must cast for it.

Because:

  • unsigned integers have well defined overflow and underflow
  • casts from signed -> unsigned int are well defined, [uint's name]_MAX - 1 is conceptually added to negative values, to map them to the extended positive number range
  • casts from unsigned -> signed int are well defined, [uint's name]_MAX - 1 is conceptually deducted from positive values beyond the signed type's max, to map them to negative numbers)

You can always perform arithmetic operations with well-defined overflow and underflow behavior, where signed integers are your starting point, albeit in a round-about way, by casting to unsigned integer first then back once finished.

int32_t x = 10;
int32_t y = -50;  

// writes -60 into z, this is well defined
int32_t z = int32_t(uint32_t(y) - uint32_t(x));

Casts between signed and unsigned integer types of the same width are free, if the CPU is using 2's compliment (nearly all do). If for some reason the platform you're targeting doesn't use 2's Compliment for signed integers, you will pay a small conversion price when casting between uint32 and int32.

But be wary when using bit widths smaller than int

usually if you are relying on unsigned overflow, you are using a smaller word width, 8bit or 16bit. These will promote to signed int at the drop of a hat (C has absolutely insane implicit integer conversion rules, this is one of C's biggest hidden gotcha's), consider:

unsigned char a = 0;  
unsigned char b = 1;
printf("%i", a - b);  // outputs -1, not 255 as you'd expect

To avoid this, you should always cast to the type you want when you are relying on that type's width, even in the middle of an operation where you think it's unnecessary. This will cast the temporary and get you the signedness AND truncate the value so you get what you expected. It's almost always free to cast, and in fact, your compiler might thank you for doing so as it can then optimize on your intentions more aggressively.

unsigned char a = 0;  
unsigned char b = 1;
printf("%i", (unsigned char)(a - b));  // cast turns -1 to 255, outputs 255
Anne Quinn
  • 12,609
  • 8
  • 54
  • 101
  • "trying to capture overflow in an **unsigned** integer requires more moving parts" You mean signed? – Olli Niemitalo Sep 18 '21 at 08:05
  • "casts from unsigned -> signed int are well defined": This isn't correct; converting from unsigned to signed yields an *implementation-defined* result if the result can't be represented in the signed type. (Or an implementation-defined signal is raised.) Most implementations do wrap as you describe, but it's not guaranteed by the standard. C17 6.3.1.3p3 – Nate Eldredge Dec 19 '21 at 20:54
0

C++ just picks up this behaviour from C.

I believe that with C that a disconnect has developed between it's users and it's implementers. C was designed as a more portable alternative to assembler and originally did not have a standard as such, just a book describing the language. In early C low level platform specific hacks were common and accepted practice. Many real-world C programmers still think of C this way.

When a standard was introduced, it's goal was largely to standardise existing practice. Some things were left undefined or implementation defined. I'm not convinced that much attention was paid to which stuff was undefined and which stuff was implementation defined.

At the time when C was standardised, twos complement was the most common approach, but other approaches were around, so C couldn't outright require twos complement.

If you read the rationale for standard C at https://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf where they discuss the choice of promotion semantics they decided that the "value preserving" semantics were safer, however they made this decision based on the assumption that most implementations used twos complement and handled wraparound quietly in the obvious manner.

Compiler vendors at some point however started treating signed overflow as an optimisation opportunity. This has turned signed overflow into a major footgun. Unless you carefully check every arithmetic operation to make sure it can't overflow you can end up triggering undefined behaviour.

Once undefined behaviour is triggered, "anything can happen". What that means in practice is that the value a variable actually contains may be outside the range of values the compiler assumes it can contain. That in turn can render bounds checking ineffective.

plugwash
  • 9,724
  • 2
  • 38
  • 51