Is overflow of intN_t well-defined?

Question

In C99 there're some (optional) types like int8_t, int16_t and the like, which are guaranteed to be have exactly specified width and no padding bits, and represent numbers in two's complement (7.18.1.1). In 6.2.6.2 signed integer overflow is mentioned as footnotes 44) and 45), namely that it might result in trapping values in padding bits.

As intN_t don't have any padding bits, and they are guaranteed to be two's complement, does this mean that their overflow doesn't generate any undefined behavior? What would be the result of e.g. overflowing multiplication? What about addition? Is the result reduced modulo 2^N as for unsigned types?

IANALL but even though they are represented with 2's complement, they still don't obey the laws of arithmetic modulo 2^n so the result of an overflow is not mathematically defined. And the standard says "If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.". — tux3, Jun 14 '15 at 12:34
Even for `uintN_t`, overflow is not (generally) well-defined. Both `intN_t` and `uintN_t` are still subject to integer promotions, and it's possible for multiplication of two `uint16_t` values, on systems where `uint16_t` promotes to `int`, to produce a result that cannot be represented in `int`. — , Jun 14 '15 at 12:48
Note: Types like `int8_t`, `int16_t` are not completely optional in C99: "These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a two’s complement representation, it **shall define** the corresponding typedef names." §7.20.1.1 3 — chux - Reinstate Monica, Jun 14 '15 at 15:55
@hvd: I wonder if there would be any practical difficulty with requiring that in cases where the result of an additive, multiplicative, bitwise, or left-shifting operator is cast or coerced to an unsigned type smaller than `int`, a compiler must either generate code which would behave as though the operator coerced its operands to that type and performed the operation using that type, or else define `__STDC_NON_MODULAR_UNSIGNED_SHORT`. Note that nearly all compilers produced prior to 2010 (and many produced since) already complied with that rule, and most can be made to comply... — supercat, Jun 16 '15 at 18:00
...using a command-line option. The rule would not require any change to any behavior that is defined by the Standard, and it is unlikely that it would contradict any defined behavior of any compiler extensions. The only effect of the rule would be to change some kinds of Undefined Behavior which would lead to *useful* optimizations into platform-independent behaviors. — supercat, Jun 16 '15 at 18:03
@supercat The example I gave of `(unsigned short) ((unsigned short) 65535 * (unsigned short) 65535)` is a case where current commonly used implementations do not comply with your suggested rule in at least some of their conforming modes, and because of that, the feature macro would not be very useful. Also, I think the feature macro should be the other way around: if the macro is undefined, you don't know whether that's because it supports the operations, or because it adheres to an older standard. Making it `__STDC_MODULAR_UNSIGNED_SHORT`: if it's defined, you do know it'll work. — , Jun 16 '15 at 18:16
@hvd: Ideally tests would exist in both directions, but from the standpoint of the standard I would consider the negative direction more important. Adding `#if (__STDC_MODULAR_UNSIGNED_SHORT>0)` `#error ...` to code which is presently being used with a legacy compiler where unsigned shorts have always worked as expected would not affect its behavior on that compiler, but would ensure that if it's moved to a newer compiler it would either work as expected or fail compilation. Only compilers that are new enough to have gotten "creative", but not new enough to say so, would pose problems. — supercat, Jun 16 '15 at 18:25
@hvd: I would also posit that it's much better to have a language where `x*=x;` is unconditionally safe and a programmer who would be willing to accept Undefined Behavior when `x` is out of bounds could write `x=(int)(x*x)`, than one where programmers wanting sane behavior must write `x*=1u*x;`, and where failure to write that will yield a program that will often work anyway (even with out-of-bounds values of x), but might at some later date cease to obey the laws of time and causality. — supercat, Jun 16 '15 at 18:43
@hvd: I would further posit that more optimizations would often be possible with a compiler which promised to abide by very loose, but not totally unconstrained, behavior on overflow, compiling a program that took advantage of that promise, than would be possible on a program that was written to absolutely positively avoid overflow under any circumstances. If an audiovisual decoder is required to fill the output buffer with valid output when given valid input, and may write whatever it likes to the output buffer given invalid input, but must not write outside that... — supercat, Jun 16 '15 at 18:52
...then having slightly-constrained overflow semantics would minimize the amount of overflow trapping the decoder would have to do to satisfy its "don't write outside the buffer" constraint. Precise overflow trapping is expensive, since it precludes many forms of loop reordering; looser semantics could be much cheaper. If an application wouldn't care if data gets processed beyond the point where overflow occurred, provided nothing writes outside the buffer, such precise trapping represents wasted effort on both the human and machine level. — supercat, Jun 16 '15 at 18:58
@supercat Do you have a concrete example of an optimisation that would be possible in a well-written program in your hypothetical future version of C, that would not be possible in a well-written program in the current version of C? I don't believe such a thing exists (even if the well-written program in the current version of C might be uglier) but I'll be happy to be shown I'm wrong. — , Jun 16 '15 at 21:28
@hvd: `typedef struct { int32_t left, top, bottom, right; } RECT; int32_t bb_right(RECT *rects, int32_t count) { int32_t max_x = INT_MIN; for (int32_t i=0; i max_x) max_x = r; } return max_x;}`. The function should return the rightmost coordinate of the given rectangles if every rectangle's width is non-negative and no greater than INT32_MAX-left. If the function is allowed to return any arbitrary value, but have no other side-effects, when given a pointer to an accessible array holding invalid rectangles... — supercat, Jun 16 '15 at 23:37
...how would one write code in standard C, in a fashion that one could expect to be just as fast as the above in cases where there is no overflow? Computing `uint32_t r = 0x80000000u+rects[i].left+rects[i].width;` and determining the maximum `r` thus computed would work, but I do not think a programmer would be entitled to expect that to be as fast as the original (maybe some optimizers might figure out that they could avoid the extra addition of 0x80000000 if they use a signed comparison operation, but that seems excessively optimistic). — supercat, Jun 16 '15 at 23:44
If one was willing to add a language extension beyond what many compilers have traditionally done, consider the requirement: `int32_t sum(int32_t *dat, int32_t count);`. Return the total if no intermediate value overflows. Return `INT32_MIN` if the result is not representable in `int32_t`. Arbitrarily return either the correct total or `INT32_MIN` if intermediate values would not fit in `int32_t` but the total would. Is there any way to write C code that would yield anything near the optimal code meeting those requirements on both 32-bit and 64-bit platforms, bearing in mind that... — supercat, Jun 16 '15 at 23:56
...on a 64-bit platform it would probably be fastest to have code be oblivious to whether anything but the final result fits in `int32_t`, but on e.g. the ARM7-TDMI the loop could be unrolled to use a load-multiple instruction, a sequence of "add if not overflow" instructions, a "subtract 1 if not overflow" instruction on the counter (pre-initialized so the last loop would wrap it from 0x80000000 to 0x7FFFFFFF), and a branch-if-not-overflow. Much faster than the optimal code to compute a 64-bit total. — supercat, Jun 17 '15 at 00:03
The reason programmers accept languages where overflows can go undetected is that precise trapping is expensive. If a language allows programmers to specify how tightly they need things guarded, a compiler may be able to add overflow checking at much lower cost (for the above example, only a small constant-time cost (zero per-loop cost!) to add overflow checking to all the values in the loop for both x64 and ARM7-TDMI; a little more cost on x86, but less than would be necessary for a program written in today's C). — supercat, Jun 17 '15 at 00:08
@supercat You were suggesting a feature macro that effectively tells whether modular arithmetic is supported on `uintN_t` types. My comments were in response to that suggestion. If your concrete example for why that feature macro is a good thing only works for `intN_t`, then a different suggestion could work, but not the suggestion you actually made. — , Jun 17 '15 at 05:48
@hvd: I thought your question for concrete examples was aimed toward my more general comment that loose-but-constrained overflow semantics would be helpful. With regard to my particular proposal here for short unsigned values, my intention would be to give up some "optimizations" which would very rarely be useful in exchange for the ability to have existing code which works correctly for any size of `int` work correctly with all sizes of `int`. Saying that `foo *= foo;` will have the same effect as `foo *= 1u*foo;` in all cases where it is defined, but may negate the laws of time and... — supercat, Jun 18 '15 at 13:21
...causality on some machines, isn't very helpful. If the reason for leaving it open-ended is to facilitate "optimization", there would be other things the standards bodies could do with integer overflow which would facilitate many more useful optimizations without making code needlessly dependent on the size of `int`; my preferred normative semantics would implicitly make modular arithmetic on short unsigned values "work" even if compilers didn't have any special handling for it (and would be helpful elsewhere as well) but nothing is really gained by breaking existing code. — supercat, Jun 18 '15 at 13:23

score 9 · Accepted Answer · 2015-06-14T15:10:39.527

9

Footnotes are not normative. If a footnote states that overflow can result in trapping values in padding bits, it is not really wrong, but I can see how it is slightly misleading. The normative text merely says the behaviour is undefined. Placing trapping values in padding bits is one possible consequence of undefined behaviour, but not the only one.

So no, this does not mean overflow is defined. It's possible for operations involving intN_t/uintN_t operands to overflow, and for that overflow to result in undefined behaviour.

Something like int8_t i = 127; ++i; has no UB. int8_t is subject to integral promotions, so the addition is carried out as if you had written i = (int8_t) ((int) i + 1);. The addition itself does not overflow, and the conversion back to int8_t produces an implementation-defined result.

Something like uint16_t u = 65535; u *= u; has UB on current typical implementations where int has 32 sign/value bits. uint16_t too is subject to integral promotions, so the multiplication is carried out as if you had written u = (uint16_t) ((int) u * (int) u);. The multiplication itself overflows, and the behaviour is undefined.

Something like int64_t i = 9223372036854775807; ++i; has UB on almost all implementations. The addition itself overflows.

edited Jun 14 '15 at 15:10

answered Jun 14 '15 at 12:58

Wouldn't `u` be promoted to `unsigned int` rather than `int`? – Oliver Charlesworth Jun 14 '15 at 14:34
1

@OliverCharlesworth Only if `int` cannot represent all of the type's values. For example if `uint16_t` is `unsigned short`, and `unsigned short` and `unsigned int` have the same size and range. But if `int` is large enough, then that will be used. It used to be different a long time ago, but as far as I know that was before the first C standard, the first C standard required the promotions to work the way I describe here, and it was never changed since. – Jun 14 '15 at 14:50
I suppose with `uint16_t u = 65535;`, `u *= u` --> `u *= u*1U;` would eliminate UB. I _think_ the `*1U` trick also helps with wider `uintN_t` too. – chux - Reinstate Monica Jun 14 '15 at 15:43
1

@chux Yes, that works. I've recommended multiplication by `1U` myself here on SO too: [32 bit unsigned multiply on 64 bit causing undefined behavior?](http://stackoverflow.com/questions/27001604/32-bit-unsigned-multiply-on-64-bit-causing-undefined-behavior) :) – Jun 14 '15 at 15:44
Unsigned arithmetic wraps overflows. Undefined behavior is just for signed. – bzim Feb 17 '18 at 16:30
@BrunoCorrêaZimmermann That's not how the standard describes it. See C99 3.4.3p3 (non-normative) giving integer overflow, not signed integer overflow, as an example of UB. But see C99 6.2.5p9 (normative) saying that unsigned integer overflow doesn't exist, because it would only be overflow if the result were out of range after the reduction modulo 2ⁿ. – Feb 17 '18 at 18:49
Oh, my bad. Yes, technically there is no overflow because of the modulo. – bzim Feb 18 '18 at 00:47

score 1 · Answer 2 · answered Jun 14 '15 at 13:31

1

No, it isn't well defined because ... it isn't defined at all. There simply is no text in the standard that would give semantic to the overflow of a signed integer.

Don't overstress the term "undefined behavior" as being something mysterious, but take in its direct sense. There is no definition of the behavior, so the standard doesn't specify anything what should happen and no portable code should rely on such a feature.

answered Jun 14 '15 at 13:31

Jens Gustedt

76,821
6
102
177

The Standard specifies that `intXX_t` types should be stored using two's-complement representation and no padding bits, and also specifies that conversion of an `int` to a smaller `int` type which is incapable of holding its value should behave in an Implementation-Defined manner. I suppose it might be possible for an implementation to uses two's complement, but define something other than two's-complement reduction for the conversion, but it would be highly unusual. The result isn't UB, however, in any case where a well-defined `int` is converted to the smaller size. – supercat Jun 16 '15 at 17:57
@supercat, the question is about overflow, not about conversion. – Jens Gustedt Jun 16 '15 at 22:52
Given `int16_t x=182;` on a system where INT_MAX is 65535 or greater, the fact that `x*=x;` would make x equal -32412 would not be "overflow" as the term is defined in the Standard, but if `x` was expected to behave as a number, it would seem most natural to say that it overflowed. – supercat Jun 16 '15 at 23:05

Is overflow of intN_t well-defined?

2 Answers2