0

It is claimed in this blog that:

  • Undefined behavior only "happens" at high optimization levels like -O2 or -O3.
  • If I turn off optimizations with a flag like -O0, then there's no UB.

are both false. I'm wondering if there's any real-world showcase for the claim.

For example, n << 1 triggers UB when n<0. For the following function:

void foo(int n) {
  int t = n<<1;
  if (n>=0)
    nuke();
}

the compiler could compile it cautiously:

void foo(int n) {
  int t = n>=0 ? (n*2) : error("lshift negative int");
  if (n>=0)
    nuke();
}

or normally:

void foo(int n) {
  int t = n*2;
  if (n>=0)
    nuke();
}

or optimize it aggresively:

void foo(int n) {
  // unused
  // int t = n<<1;

  // always true, otherwise UB
  // if (n>=0)
    nuke();
}

Is there any modern popular compiler like gcc/clang that behave in the last way, where some UB not only causes unexpected behavior locally at that statement, but could also be exploited purposely (not considering buffer-overflow attack etc) and pollute the control flow globally, even when -O0 is specified?

Put it simply, are all UBs practically somehow implementation-defined under -O0?

== EDIT ==

The question is not if those claims are theoretically false or nonsensical (because they are). It's whether there's realworld showcase. As @nate-eldredge has rephrased it in the comment:

given some piece of code that is formally UB, a real-life non-optimizing compiler produces results which are particularly surprising (in the way described above), even to a reasonably knowledgeable programmer?

Sylvain Hubert
  • 384
  • 1
  • 9
  • Your question is general in nature and likely not capable of a concise answer. That said, a compiler would not be "popular" for long if it compiled in UB without `-O3` or `-Ofast` optimizations -- that let the compiler aggressively optimize and make use of the "as if" rule (which still shouldn't result in UB, but may compile to something other than what you have written). No compiler I know of gcc/clang/cl.exe (VS) produce UB in optimized or unoptimized code. The compiler should not expose exploits by increasing levels of optimization. – David C. Rankin Jul 19 '23 at 10:22
  • Seems to be a duplicate of [Arithmetic bit-shift on a signed integer](https://stackoverflow.com/q/4009885/3422102) – David C. Rankin Jul 19 '23 at 10:22
  • @DavidC.Rankin your link has nothing to do with the question asked – Fredrik Jul 19 '23 at 10:25
  • 2
    That's why I didn't mark it as a dupe but suggested it. I don't think there will be one on how each compiler handles optimizations of shifts of signed `int` and the resulting UB or IB. – David C. Rankin Jul 19 '23 at 10:27
  • @DavidC.Rankin question updated: the question is not restricted to lshift, which is only for illustration purpose here – Sylvain Hubert Jul 19 '23 at 10:49
  • 4
    UBs are defined by the language standard. It does not matter if a UB is expressin itself during program execution in any way. It is still a UB. Undefined - because no one knows what will happens (C language wise). – 0___________ Jul 19 '23 at 10:53
  • *Put it simply, are all UBs **practically** somehow implementation-defined under -O0?* Is dereferencing a `NULL` pointer going to **not** result in a `SIGSEGV` just because you turned off the optimizer? – Andrew Henle Jul 19 '23 at 12:06
  • *I'm wondering if there's any real-world showcase for the claim.* You can't get that. UB is UB no matter which compiler optimization level you are using. – Support Ukraine Jul 19 '23 at 12:40
  • *Put it simply, are all UBs practically somehow implementation-defined under -O0?* In the standard "implementation-defined" means "documented behavior that may differ from system to system". The standard has a list (J.3) of implementation-defined areas that implementations are required to document. This is **not** the same as the list of undefine behavior. – Support Ukraine Jul 19 '23 at 12:45
  • Code with UB can sometimes do exactly as you expect when compiled with -O0 and fail big time when compiled with -O2. But that doesn't change the fact the behavior is UB in both cases. – Support Ukraine Jul 19 '23 at 12:49
  • I suppose that a particularly smart compiler that also "wishes" to not be helpful to the user could, upon detecting UB, simply generate code that returns immediately. – Thomas Jager Jul 19 '23 at 12:59
  • `int *a=(int*)0xDEADBEAF; *a=12;` causes UB no matter what optimization level. There is no real way to prevent UB in this case, no matter what. – 12431234123412341234123 Jul 19 '23 at 13:56
  • There is a similar discussion [here](https://stackoverflow.com/questions/26526426/is-something-undefined-behavior-by-omission) – ryyker Jul 19 '23 at 14:20
  • To avoid the nonsensical phrasing about "UB happens", maybe what you are really looking for are simply instances where: given some piece of code that is formally UB, a real-life non-optimizing compiler produces results which are particularly *surprising*, even to a reasonably knowledgeable programmer? I can see that answers might be interesting, but it also might be too much opinion-based big list for this site. – Nate Eldredge Jul 20 '23 at 05:02
  • For instance, given `uint32_t x = 0xdeadbeef; int count = 36; return x << count;`, I guess many people would expect the result to either be `0` or an overflow trap. They might not expect that, [with `gcc -O0` on x86, the result is `0xeadbeef0`](https://godbolt.org/z/5YPdM3cMq) - unless they were pretty well acquainted with the details of x86 shift instructions. – Nate Eldredge Jul 20 '23 at 05:10
  • @12431234123412341234123: Regarding that example, I suppose the point is that one might guess the *actual* behavior would be for the CPU to execute a store instruction with a value of 12 and address `0xdeadbeaf`, followed by whatever normally happens when the machine does that. For instance, if you know that your OS never maps that address range, you might guess the result of your code would be a page fault. I think the question here is to find examples where, even with a non-optimizing compiler, those guesses are wrong. – Nate Eldredge Jul 20 '23 at 05:25
  • @NateEldredge You're correct that what I'm looking for are "surprising" pieces of code, but only in the way descried in OP. For example, `0xdeadbeef << 36` resulting in `0xeadbeef0` "only causes unexpected behavior locally at that statement", and is thus not very surprising. – Sylvain Hubert Jul 25 '23 at 06:46
  • @AndrewHenle Exactly because derefing NULL always causes SIGSEGV with -O0, this behavior is practically (or even in text, I'm not sure) implementation-defined, which is not of interest for the question. – Sylvain Hubert Jul 25 '23 at 06:52
  • @SupportUkraine I know UB is always UB because that's a normative concept but this s a trivial pedantic tautology which is not the question. – Sylvain Hubert Jul 25 '23 at 07:00
  • 1
    So, what is your **_real_** question? UB is always UB, even if the behavior makes sense. Do you want to know whether some compiler without optimization shows **nonsense behavior** where the standard declares UB? If so, please [edit] your question to correct it. Otherwise, @0___________'s answer says it all. – the busybee Jul 25 '23 at 07:28
  • @thebusybee I'm not sure which part of the question should be edited. The question is about *real-world showcases* as said in the very first paragraph, followed by a detailed description of what is considered as "nonsense behavior". – Sylvain Hubert Jul 25 '23 at 09:57
  • The title and the introducing paragraphs tell us that you look for compilers producing UB. That is already answered: all of them do. But you look for nonsense (or surprising, as Nate worded it) behavior. The title and prose of your question should reflect exactly that. And I was not sure, that this is what you look for. – the busybee Jul 25 '23 at 10:25

3 Answers3

6
    Undefined behavior only "happens" at high optimization levels like -O2 or -O3.
    If I turn off optimizations with a flag like -O0, then there's no UB.

It is false for one reason. What is undefined in the C language is defined in the C standard. Undefined Behaviour means that from the C language point of view, we do not know how the program will behave. UB does not have to express itself in any particular way - but it is still UB.

Those claims come from not understanding what UB is. UBs do not "happen". They are on the C language level. Even if the program "works fine" it is still UB. As the behaviour is undefined it may stop working if you change the compiler, compiler version, compiler options or run on another OS or hardware.

0___________
  • 60,014
  • 4
  • 34
  • 74
  • I agree that they're **theoretically** false or nonsensical but that's not the question. The question was whether there's any **showcase**, where the behavior becomes *wild enough* in the way described even with -O0. – Sylvain Hubert Jul 25 '23 at 06:36
  • @SylvainHubert Any code that runs **as you expect** in the presence of undefined behavior is *wild enough* - because you get this misguided, malinformed notion that somehow the undefined behavior didn't break things. If the code has undefined behavior - **IT'S BROKEN**. Trying to convince anyone otherwise merely convinces them you have low standards for your code. – Andrew Henle Aug 09 '23 at 16:48
  • @AndrewHenle I am indeed trying to convince someone who does not necessarily have decent C coding standard and care enough about whether the C code itself is "broken" or not beyond its impact on the compiled binary. So again, I agree but that's not the question. – Sylvain Hubert Aug 11 '23 at 02:38
  • @SylvainHubert Then you need to stop trying to prove this person's code can produce incorrect results, because this person's insistence that you have to do that is **wrong**. Programming doesn't work that way, and anyone who demands that has raised incompetence to a disruptive level. And I chose those words carefully - if this is a professional coder, **you can not trust the code this person writes**. The amount of effort you need to spend to watch over this person to verify nothing is broken means this person is a net negative, and it's time to let this person go. – Andrew Henle Aug 11 '23 at 10:48
  • @SylvainHubert (cont) Code has to be proven correct - and if it invokes undefined behavior that's literally impossible. The burden is on this person to prove the code is right, not on you to prove it's wrong. You're [wrestling with a pig here](https://www.goodreads.com/quotes/43033-never-wrestle-with-pigs-you-both-get-dirty-and-the). And getting dirty. "But it works!" is an ignorant cry and the response is simple: "Until it doesn't. Have higher standards." – Andrew Henle Aug 11 '23 at 10:53
  • @AndrewHenle The purpose of my question was to distinguish between "wrong" and "less right". From my understanding, UB is basically compilers' legal right to do whatever they want, but had a compiler under certain configuration waived this right and always chosen to behave like the vanilla interpreter inside junior programmers' mind, this would imply a de-facto "consensus" between the programmer and this compiler (not C itself), which is less formal than the standard but still sufficient to support that UB != nonsense in some situation. -- But of course, the accepted answer disproves this. – Sylvain Hubert Aug 12 '23 at 03:37
  • @AndrewHenle So yeah, UB turns out to be provably practically plain wrong in all situations, not just less right, or just pedantically, theoretically, potentially wrong under optimization. – Sylvain Hubert Aug 12 '23 at 03:42
  • 1
    @SylvainHubert rule of thumb - C program with UBs is garbage and the programmer who is arguing that is OK is lamer – 0___________ Aug 12 '23 at 08:24
2

First let's make it absolutely clear that the blog is correct:

Both statements are false

As user @O___________ also writes in this answer https://stackoverflow.com/a/76720573/4386427 Undefined Behavior is a property of the C source code. No matter what a compiler does the C source code still has undefined behavior. A compiler can't change that.

Then you ask for an example that will surprise (quote): surprising (in the way described above), even to a reasonably knowledgeable programmer

The answer to that must be Such an example doesn't exists

Reason: A "reasonably knowledgeable programmer" knows that it makes no sense to reason about how code with undefined behavior behaves. So a "reasonably knowledgeable programmer" will never be surprised no matter what the resulting program does.

For "less knowledgeable programmers" there may be many examples that could be surprising. For instance:

#include <stdio.h>

int* foo(void)
{
    int x;
    printf("%p\n", (void*)&x);
    return &x;
}

int main(void)
{
    printf("%p\n", (void*)foo());
    return 0;
}

With gcc 12.2 I get:

0x7ffc5981a73c
(nil)

Am I surprised? No, the code has undefine behavior so I don't expect any specific behavior.

Would an unexperienced C programmer be surprised? Perhaps.

Put it simply, are all UBs practically somehow implementation-defined under -O0?

No

"implementation-defined" is something completely different than undefined behavior. An implementation is not required to specify what it will do with code having undefined behavior. It's even allowed to do one thing on mondays, another thing on tuesdays and so on.

"implementation-defined" behavior is something that the implementation must document so that users know what will happen. Two different implementations are allowes to do different things as long as the they document what they do. For code with undefined behavior no documentation is required.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
0

If one were to specify a language "nonBrokenC" which augments the C Standard with the following sentence:

  • If parts of the Standard, together with a platform's documentation, K&R1, and K&R2 would describe the behavior of some construct, but some other part of the Standard would characterize the action as undefined, the definition will take precedence.

then most C implementations would be configurable to process nonBrokenC. Compilers may, even with optimizations disabled, vary in things which were never documented in any of the above places, such as how they hold the values of automatic-duration objects whose address is not taken. Thus, for example, something like:

int test1(void)
{
  int x[1],y;
  x[1] = 2;
  return y;
}

might store the value 2 into the storage used to hold y and then return the contents of that storage, but because the address of y is never taken, implementations may even without optimizations enabled decide not to place it on the stack immediately after x.

On the other hand, given something like:

unsigned mul_mod_65536(unsigned short x, unsigned short y)
{
  return (x*y) & 0xFFFF;
}

I would doubt that any compiler for a system with 32-bit two's-complement int type would, unless optimizations or extra diagnostics are disabled, generate code that does anything other than yield the bottom 16 bits of the arithmetic product of x and y, even if that product would fall in the range 0x80000000u to 0xFFFFFFFFu, and would regard any compiler for a 32-bit two's-complement platform where such computation could arbitrarily corrupt memory as untrustworthy even though the Standard had failed to anticipate that compilers for such platforms might behave that way and thus failed to forbid such behavior.

supercat
  • 77,689
  • 9
  • 166
  • 211