I'm having some difficulty understanding these comments about detecting integer overflows

Question

You'll find the quoted text below in section I. Introduction of the article Understanding Integer Overflow in C/C++ (emphases are mine):

Detecting integer overflows is relatively straightforward by using a modified compiler to insert runtime checks. However, reliable detection of overflow errors is surprisingly difficult because overflow behaviors are not always bugs. The low-level nature of C and C++ means that bit- and byte-level manipulation of objects is commonplace; the line between mathematical and bit-level operations can often be quite blurry. Wraparound behavior using unsigned integers is legal and well-defined, and there are code idioms that deliberately use it. On the other hand, C and C++ have undefined semantics for signed overflow and shift past bitwidth: operations that are perfectly well-defined in other languages such as Java. C/C++ programmers are not always aware of the distinct rules for signed vs. unsigned types in C, and may naively use signed types in intentional wraparound operations.¹ If such uses were rare, compiler-based overflow detection would be a reasonable way to perform integer error detection. If it is not rare, however, such an approach would be impractical and more sophisticated techniques would be needed to distinguish intentional uses from unintentional ones.

I don't understand why compiler based detection would be impractical to detect wraparound operations on signed types, if such uses are not rare? Also, why would we need to distinguish between intentional and unintentional uses? Both are undefined behavior by the Standard.

Too broad. And pleae select one of the **different** languages . C is not C++ is not C! — too honest for this site, Jul 29 '17 at 18:45
...because the compiler cannot know the runtime value that overflows, such as `i <<= n` or `i += n`; — Weather Vane, Jul 29 '17 at 18:49
`1 << 31` is UB by standard, but if you use it nothing bad will happen (on most compilers). In such an easy case it would make a warning (actually gcc for example does have a special warning parameter for above like stuff). OTOH, if you would have `uint32_t x = 1 << shift; // shift is run time defined value` would you warn the user? — 0andriy, Jul 29 '17 at 18:58
@0andriy Please provide a reference for your claim "`1 << 31` is UB by standard". Until then it is wrong as stated. — too honest for this site, Jul 29 '17 at 19:05
Hint: Why do you think C (and C-**style** C++) are **the** high-level languages to pick if speed and/or size are major design targets? — too honest for this site, Jul 29 '17 at 19:07
C is the high level language giving the coder practically 100%of freedom. You can write the complete operating system, including all hardware related operations in C without touching the assembler (maybe except some very specific instructions like context switching, stack manipulation etc). So the runtime cannot affect or check the compiled code. If you need the language which will chexmch the results during the runtime - choose another language) — 0___________, Jul 29 '17 at 19:33
@Olaf: on behalf of 0andriy, ISO/IEC 9899:2011 §6.5.7 **Bitwise shift operators** ¶4 _The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined._ (Using `^` for exponentiation; can't do `^{` in comments.) _[…continued…]_} — Jonathan Leffler, Jul 29 '17 at 19:38
_[…continuation…]_ Since `1` is a (signed) `int` constant, and for a 32-bit `int` type, `INT_MAX` is 2^31 - 1, `1 << 31` cannot be represented in the result type, so the behaviour is undefined, as @0andriy claimed. — Jonathan Leffler, Jul 29 '17 at 19:41
@Olaf, Jonathan explained above. (1 is signed int because of promotion) — 0andriy, Jul 29 '17 at 19:52
@JonathanLeffler: I don't see there are 32 bit `int` used in the question. And - although I agree it is most likely, there is no requirement for `INT_MAX` to be `2**31 - 1` for a 32 bit `int` either. So no, the comment is not correct **as stated**. A lot of confusion in C questions stems from incorrect or false assumptions. So let's just stick to the facts given in the question (I'm _pretty sure_ this is one requirement for questions;-) — too honest for this site, Jul 29 '17 at 19:52
@PeterJ `So the runtime cannot affect or check the compiled code` The text in the article refers to a modified compiler to insert runtime checks. — Belloc, Jul 29 '17 at 19:53
@Olaf, I have no time to explain, I was exactly on your side, when I got an explanation with references to a standard. — 0andriy, Jul 29 '17 at 19:55
@0andriy Sorry, is seem to have missed that (tbh, I still do) that. I only criticised about the false assumption of `int` having 32 **used** bits (i.e. no/excluding padding bits). — too honest for this site, Jul 29 '17 at 19:58
@Olaf, https://stackoverflow.com/questions/26331035/why-was-1-31-changed-to-be-implementation-defined-in-c14 — 0andriy, Jul 29 '17 at 20:06
@0andriy: The question is tagged both, C++ and C. The linked question is about C++ for good reasons. In C it is UB. That's one reason questions tagged with two different languages should be closed as too broad in general (with few exceptions about true interaction). And with >31 value-bits `int`, it is fine. — too honest for this site, Jul 29 '17 at 20:11
@Belloc to be honest I do not see any possibility to perform such a checks during runtime - and I have not seen any hardware which supports it. It is easy to check the float or double numbers overflows as most of the FPU-s support it, but I do not see any practically applicable way for integers - at least one not giving false positives, and without implementing the machine code interpreter. — 0___________, Jul 29 '17 at 20:44

score 1 · Answer 1 · answered Jul 30 '17 at 00:22

Detecting signed integer overflows at runtime is no problem. New languages like Swift do it automatically and reliably.

The problem is: Although integer overflows are undefined behaviour in C and C++, there are tons and tons of code where integer overflows happen, and because the compiler silently ignores integer overflows, everything works just fine.

If you start detecting integer overflows, such uses will break the application. And of course these overflows won't happen when the developer runs the application, or a tester runs it, but only when the program is shipped to customers, who will get very, very angry if their application crashes at the most inappropriate and most costly time, just because you decided to disallow some undefined behaviour that worked just fine.

Optimizing C and C++ compilers take advantage of signed overflow being undefined in myriad ways, they don't ignore it. — Deduplicator, Jul 30 '17 at 00:43

David Hoelzer · Answer 2 · 2017-07-30T00:04:58.837

0

For a compiler to detect overflows at compile time, in all but the most trivial cases, would require the compiler to account for every possible input that could influence a variable and to calculate every possible value that could result.

Obviously, this is not realistic.

An example of taking advantage of the overflow is using the side effect for something else. Here's a contrived example for a ring buffer:

 int main()
 {
   uint8 index = 8;
   char keys[256];

   init_keys(keys); // Put single chars in the array
   while(1) {
     int letter;
     letter = getc();
     letter ^= keys[index];
     index ++;
     printf("Encoded: %c\n", letter);
   }
 }

In this example, we create an 8 bit integer which must overflow at 255+1. We are taking advantage of this overflow to implement a ring buffer with this value directly rather than using the modulus, which would be more typical.

edited Jul 30 '17 at 00:04

answered Jul 29 '17 at 19:58

David Hoelzer

15,862
4
48
67

Why would it be realistic in the case stated in the article? `If such uses were rare, compiler-based overflow detection would be a reasonable way to perform integer error detection.` – Belloc Jul 29 '17 at 20:14
Because they aren't rare. That's the point. They can happen in any number of ways, entirely dependent on the input to the application and use of the variables. – David Hoelzer Jul 29 '17 at 20:22
@Belloc Look at it this way. If there was only 2 ways to overflow a integer it would be pretty easy to diagnose that. Sine there are a lot pf ways to overflow an integer it is to much for a compiler to try and diagnose. – NathanOliver Jul 29 '17 at 20:58
@NathanOliver Yes, maybe that's the interpretation to be given to the quoted text in my prior comment. But what about the point of distinguishing between intentional and unintentional uses? What are the authors trying to imply here? – Belloc Jul 29 '17 at 21:15
1

At times you intend to take advantage of the overflow – David Hoelzer Jul 30 '17 at 00:00
I added a contrived example that takes advantage of an unsigned overflow. – David Hoelzer Jul 30 '17 at 00:05
There is nothing in the question about detecting overflows at compile time. It's about runtime code to detect them, and why it should not be inserted by the compiler. – user207421 Jul 30 '17 at 01:07

score 0 · Answer 3 · answered Jul 30 '17 at 01:14

There are 5 reasonable ways to handle overflow, be it signed or unsigned:

Trap. Normally needs extra-instructions.
Saturating. Rarely available natively, normally needs extra-instructions.
Wrap-around. Always available natively for everything but not-2s-complement signed types. Used for unsigned types.
Undefined Behavior. Always available natively and allows the compiler to make assumptions for optimizing. Used for signed types.
Arbitrary result. Always available natively. Only interesting when wrap-around not available natively. This is weaker than UB, which is both its biggest advantage and disadvantage.

UB is good for optimizing, trapping for error-detection, wrap-around and saturating are sometimes wanted behavior.
Arbitrary result is a fills the gap where wrap-around is expensive but full UB not warranted.

Now, sometimes the compiler can prove that an operation cannot overflow, so it needs not handle that case. Often for loop-counter and the like, so the extra work isn't quite as much as it looks. But tracing which values data can have is not perfect even with full source, and barriers to inlining like separate compilation and semantic interpositioning where allowed can make it impossible.

I'm having some difficulty understanding these comments about detecting integer overflows

3 Answers3