GCC: Should undefined behavior of overflows preserve logical consistency?

Question

The following code produces strange things on my system:

#include <stdio.h>

void f (int x) {
  int y = x + x;
  int v = !y;
  if (x == (1 << 31))
    printf ("y: %d, !y: %d\n", y, !y);
}

int main () {
  f (1 << 31);
  return 0;
}

Compiled with -O1, this prints y: 0, !y: 0.

Now beyond the puzzling fact that removing the int v or the if lines produces the expected result, I'm not comfortable with undefined behavior of overflows translating to logical inconsistency.

Should this be considered a bug, or is the GCC team philosophy that one unexpected behavior can cascade into logical contradiction?

So you are expecting the undefined behaviour to have logical/defined behaviour? If the behaviour is undefined `gcc` can do whatever it wants. — kaylum, Jan 22 '20 at 00:11
@kaylum: I am expecting the undefined behavior to be localized to the computation of `y` above, and that the rest would behave normally, given that value.; my question was indeed whether this was the case. — Michaël, Jan 22 '20 at 16:44

Marco Bonelli · Accepted Answer · 2020-02-24T13:33:38.653

When invoking undefined behavior, anything can happen. There's a reason why it's called undefined behavior, after all.

Should this be considered a bug, or is the GCC team philosophy that one unexpected behavior can cascade into logical contradiction?

It's not a bug. I don't know much about the philosophy of the GCC team, but in general undefined behavior is "useful" to compiler developers to implement certain optimizations: assuming something will never happen makes it easier to optimize code. The reason why anything can happen after UB is exactly because of this. The compiler makes a lot of assumptions and if any of them is broken then the emitted code cannot be trusted.

As I said in another answer of mine:

Undefined behavior means that anything can happen. There is no explanation as to why anything strange happens after invoking undefined behavior, nor there needs to be. The compiler could very well emit 16-bit Real Mode x86 assembly, produce a binary that deletes your entire home folder, emit the Apollo 11 Guidance Computer assembly code, or whatever else. It is not a bug. It's perfectly conforming to the standard.

Thanks, got it: unexpected behaviors are not limited to the expression where they happen. — Michaël, Jan 22 '20 at 16:47
The authors of the Standard said of Undefined Behavior, "It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior". — supercat, Jan 23 '20 at 03:10

score 3 · Answer 2 · edited Jun 20 '20 at 09:12

3

The 2018 C standard defines, in clause 3.4.3, paragraph 1, “undefined behavior” to be:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements

That is quite simple. There are no requirements from the standard. So, no, the standard does not require the behavior to be “consistent.” There is no requirement.

Furthermore, compilers, operating systems, and other things involved in building and running a program generally do not impose any requirement of “consistency” in the sense asked about in this question.

Addendum

Note that answers that say “anything can happen” are incorrect. The C standard only says that it imposes no requirements when there is behavior that it deems “undefined.” It does not nullify other requirements and has no authority to nullify them. Any specifications of compilers, operating systems, machine architectures, or consumer product laws; or laws of physics; laws of logic; or other constraints still apply. One situation where this matters is simply linking to software libraries not written in C: The C standard does not define what happens, but what does happen is still constrained by the other programming language(s) used and the specifications of the libraries, as well as the linker, operating system, and so on.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 22 '20 at 02:11

Eric Postpischil

195,579
13
168
312

What you say is true of compilers that honor the intention of the authors of the C Standard, but is true of neither gcc nor clang. – supercat Feb 22 '20 at 01:27
2

@supercat: There is no way in which any C implementation or C “wannabe” implementation or any software or any elephant can fail to conform to the C standard’s requirements for undefined behavior, namely no requirements. – Eric Postpischil Feb 22 '20 at 02:50
@EricPostpischil: The Standard actually imposes no requirements upon how any implementation processes any program which doesn't exercise the translation limits given in the Standard. If an implementation doesn't correctly processes at least one program that exercises the translation limits, it's non-conforming; if it does correctly process at least one program that exercises the limits, the Standard imposes no requirements on how it processes anything else. – supercat Feb 22 '20 at 06:02
@EricPostpischil: Besides, my point was that the authors of clang and gcc don't see any value in limiting the consequences of UB to those that would make sense given the target platform or most tasks that people would use a compiler to perform. – supercat Feb 22 '20 at 06:05
2

@supercat: “All models are wrong. Some models are useful.” All physics models are wrong—nothing is frictionless, nothing is Newtonian, real solids do not have exactly the moments of inertia that the models use. Human bodies do not react exactly as medical models, such as half-lives of medications predict. Chemical preparations are impure. And compilers are imperfect and the C standard is limited. Get over it. The C standard is hugely useful in practice. It is not the pure mathematical model you want it to be. Stop harping about it constantly. – Eric Postpischil Feb 22 '20 at 11:01
@EricPostpischil: Many models are useful if employed properly, and can be worse than useless if abused. If one reads the C Standard in light of the published Rationale document, it is imperfect but useful. – supercat Feb 23 '20 at 02:32

score 0 · Answer 3 · answered Feb 24 '20 at 07:21

Marco Bonelli has given the reasons why such behaviour is allowed; I'd like to attempt an explanation of why it might be practical.

Optimising compilers, by definition, are expected to do various stuff in order to make programs run faster. They are allowed to delete unused code, unwrap loops, rearrange operations and so on.

Taking your code, can the compiler be really expected to perform the !y operation strictly before the call to printf()? I'd say if you impose such rules, there'll be no place left for any optimisations. So, a compiler should be free to rewrite the code as

void f (int x) {
  int y = x + x;
  int notY = !(x + x);
  if (x == (1 << 31))
    printf ("y: %d, !y: %d\n", y, notY);
}

Now, it should be obvious that for any inputs which don't cause overflow the behaviour would be identical. However, in the case of overflow y and notY experience the effects of UB independently, and may both become 0 because why not.

score -1 · Answer 4 · answered Jan 23 '20 at 03:34

For some reason, a myth has emerged that the authors of the Standard used the phrase "Undefined Behavior" to describe actions which earlier descriptions of the language by its inventor characterized as "machine dependent" was to allow compilers to infer that various things wouldn't happen. While it is true that the Standard doesn't require that implementations process such actions meaningfully even on platforms where there would be a natural "machine-dependent" behavior, the Standard also doesn't require that any implementation be capable of processing any useful programs meaningfully; an implementation could be conforming without being able to meaningfully process anything other than a single contrived and useless program. That's not a twisting of the Standard's intention: "While a deficient implementation could probably contrive a program that meets this requirement, yet still succeed in being useless, the C89 Committee felt that such ingenuity would probably require more work than making something useful."

In discussing the decision to make short unsigned values promote to signed int, the authors of the Standard observed that most current implementations used quiet wraparound integer-overflow semantics, and having values promote to signed int would not adversely affect behavior if the value was used in overflow scenarios where the upper bits wouldn't matter.

From a practical perspective, guaranteeing clean wraparound semantics costs a little more than would allowing integer computations to behave as though performed on larger types at unspecified times. Even in the absence of "optimization", even straightforward code generation for an expression like long1 = int1*int2+long2; would on many platforms benefit from being able to use the result of a 16x16->32 or 32x32->64 multiply instruction directly, rather than having to sign-extend the lower half of the result. Further, allowing a compiler to evaluate x+1 as a type larger than x at its convenience would allow it to replace x+1 > y with x >= y--generally a useful and safe optimization.

Compilers like gcc go further, however. Even though the authors of the Standard observed that in the evaluation of something like:

unsigned mul(unsigned short x, unsigned short y) { return x*y; }

the Standard's decision to promote x and y to signed int wouldn't adversely affect behavior compared with using unsigned ("Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with two’s-complement arithmetic and quiet wraparound on signed overflow—that is, in most current implementations."), gcc will sometimes use the above function to infer within calling code that x cannot possibly exceed INT_MAX/y. I've seen no evidence that the authors of the Standard anticipated such behavior, much less intended to encourage it. While the authors of gcc claim any code that would invoke overflow in such cases is "broken", I don't think the authors of the Standard would agree, since in discussing conformance, they note: "The goal is to give the programmer a fighting chance to make powerful C programs that are also highly portable, without seeming to demean perfectly useful C programs that happen not to be portable, thus the adverb strictly."

Because the authors of the Standard failed to forbid the authors of gcc from processing code nonsensically in case of integer overflow, even on quiet-wraparound platforms, they insist that they should jump the rails in such cases. No compiler writer who was trying to win paying customers would take such an attitude, but the authors of the Standard failed to realize that compiler writers might value cleverness over customer satisfaction.

GCC: Should undefined behavior of overflows preserve logical consistency?

4 Answers4

Addendum