3

I have code that calculates an array index, and if it is valid accesses that array item. Something like:

int b = rowCount() - 1;
if (b == -1) return;
const BlockInfo& bi = blockInfo[b];

I am worried that this might be triggering undefined behavior. For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.

Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result? Does it change if blockInfo is not an actual array, but an container like a vector? If this is unsafe, could I fix it by putting the access in an else clause?

if (b == -1) {
    return;
} else {
    const BlockInfo& bi = blockInfo[b];
}

Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?

For clarification: My concern is specifically because of a different issue, where you intend to test whether a pointer is non-null before accessing it. The compiler turns this around and reasons that, since you are dereferencing it, it cannot have been null! Something like this (untested):

void someFunc(struct MyStruct *s) {
    if (s != NULL) {
       cout << s->someField << endl;
       delete s;
    }
 }

I recall hearing that simply forming an out-of-bounds array access is UB in C++. Thus the compiler could legally assume the array index is not out of bounds, and remove checks to the contrary.

jdm
  • 9,470
  • 12
  • 58
  • 110
  • 2
    I'm not clear on what your concern is. The compiler is not going to add UB to a program that doesn't have any in the first place, which seems to be one of your concerns. – cigien Dec 29 '20 at 22:27
  • What does rowCount return? You are asking if compiler optimization can add bug into your code... – Tony Tannous Dec 29 '20 at 22:29
  • Both segments of the shown code are logically equivalent 100% (excluding the reference going out of scope, but it's clear that this is for demonstration purposes only). Whatever the compiler logically does with one, I would expect it to do with the other, too. – Sam Varshavchik Dec 29 '20 at 22:32
  • Does this answer your question? [Can compiler optimization introduce bugs?](https://stackoverflow.com/questions/2722302/can-compiler-optimization-introduce-bugs) – Tony Tannous Dec 29 '20 at 22:33
  • Well, it is clearly wrong to call `someFunction(blockInfo[-1])`. What is not clear to me is, is it OK to have `bi = blockInfo[b]` in a scope, and jump out just before, when I realize `b = -1`. I don't know for sure, but I could imagine this would be illegal or UB. I fear a case where the compiler makes some assumptions, and then moves the assignment before the `if` clause or deletes the clause altogether. I recall there was a case in the Linux kernel where null checks were optimized out because the compiler could assume a certain pointer can never be null. – jdm Dec 29 '20 at 22:34
  • @TonyTannous: `rowCount` returns an `int`, and can be zero or positive (what else would make sense?). In a strict sense, I am not asking if a "compiler optimization can introduce a bug" (although extremely colloquialy speaking, yes, that is what I want to know). Strictly speaking, I want to know 1) when and when not it is `undefined behavior` to use an invalid index into an array, where "use" means that statement is not neccessarily reached, and 2) when the compiler can use that to generate code that is against my intention. – jdm Dec 29 '20 at 22:39
  • @jdm `if (0) { blockInfo[-10] }` you are accessing an array out of bound, but this code is never executed whether because the compiler optimize it away or it's never triggered. If the compiler does an optimization that eventually allows you to shoot yourself in the foot then, it's a bug by the compiler. – Tony Tannous Dec 29 '20 at 22:52
  • @TonyTannous Yeah, but UB that a human would "obviously" optimize away is still UB, isn't it? See this for example: https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-call-never.html – jdm Dec 29 '20 at 22:57
  • @jdm: Yes, but only if your program _does_ have UB. Yours doesn't. It literally says, "if the index `b` is -1, return from the function before accessing `blockInfo[b]`". There is no world in which there is cause to worry that the behaviour of that program is undefined due to an access to `blockInfo[b]`, because your program literally says the opposite thing. – Asteroids With Wings Dec 29 '20 at 23:38
  • Does this answer your question? [Does an expression with undefined behaviour that is never actually executed make a program erroneous?](https://stackoverflow.com/questions/24186681/does-an-expression-with-undefined-behaviour-that-is-never-actually-executed-make) – Language Lawyer Dec 30 '20 at 00:54

2 Answers2

2

There is no access to blockInfo[-1] in your program. Your code specifically prohibits that.


For example, the compiler might assume that b is always non-negative, since I use it to index the array, so it will optimize the if clause away.

No, it cannot do that, precisely because an access to index -1 (or, rather, (std::size_t)-1) may or may not be a valid index. The language does let you pass -1 as an index; it'll just be converted first to a std::size_t with the well-defined unsigned wrap-around logic that comes with doing so. So there is not, and cannot be, any rule whereby the compiler is permitted to assume that you will never pass int -1 as an index.

Even if there were, it'd still make no sense to let the compiler completely ignore the if statement. If it could, if our if statements were not reliable, every program in the world would be unsafe! There'd be no way to enforce any of your operations' preconditions.


The compiler may only skip or re-order things when it can prove that doing so results in a well-defined program with the same behaviour as your original instructions, given any possible input.

In fact, this is where UB comes from: where proving correctness is really difficult, that's usually where the standard throws compilers a bone and says something is "undefined" and the compiler can just do whatever it likes.

One interesting example of this is kind of the opposite of your case, where a check is [erroneously] placed after the access, and the compiler therefore assumes the check passes, whether it actually did or not:

void foo(char* ptr)
{
   char x = *ptr;
   if (ptr)
      bar();
   else
      baz();
}

The function foo may call bar() even if ptr is null! That might sound unlikely to you, but it actually does happen (e.g. this crash in a widely-used library).


could I fix it by putting the access in an else clause?

Those two pieces of code are semantically equivalent; it's the same program.


Lastly, are there compiler flags in the spirit of -fno-strict-aliasing or -fno-delete-null-pointer-checks that make the compiler "do the obvious thing" and prevent any unwanted behavior?

The compiler already does the obvious thing, as long as "obvious" is "according to the C++ standard".

Asteroids With Wings
  • 17,071
  • 2
  • 21
  • 35
1

the compiler might assume

If the compiler proceeds from a wrong assumption, then it's wrong and defective.

Under which circumstances is it safe to "access" an array out-of-bounds, when you do nothing with the invalid result?

It is never safe to access an array out of bounds, because that produces UB before you have a chance to use or not-use the result. However, an untaken branch in the code doesn't count as an access, as in your first or second examples. So, if I understand your last question, there's no need for a special flag.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • So one can say forming an array access with an invalid index is not undefined behavior, even if you enter its scope, as long as you branch away before the statement is reached? – jdm Dec 29 '20 at 23:02
  • You could sort-of say that, but only declarations have scope and not statements, and the scope of a declaration doesn't include the preceding line. – Potatoswatter Dec 29 '20 at 23:22
  • @jdm Your program never makes such an array access, _because_ of the `if` statement. That's the point of having `if` statements. If it worked the way you were concerned it would, practically every single array access with a variable index in the world would potentially have undefined behaviour. – Asteroids With Wings Dec 29 '20 at 23:34