Does undefined behavior really help modern compilers to optimize generated code?

Question

Aren't modern compilers smart enough to be able to generate a code that is fast and safe at the same time?

Look at the code below:

std::vector<int> a(100);
for (int i = 0; i < 50; i++)
    { a.at(i) = i; }
...

It's obvious that the out of range error will never happen here, and a smart compiler can generate the next code:

std::vector<int> a(100);
for (int i = 0; i < 50; i++)
    { a[i] = i; } // operator[] doesn't check for out of range
...

Now let's check this code:

std::vector<int> a(unknown_function());
for (int i = 0; i < 50; i++)
    { a.at(i) = i; }
...

It can be changed to such equivalent:

std::vector<int> a(unknown_function());
size_t __loop_limit = std::min(a.size(), 50);
for (int i = 0; i < __loop_limit; i++)
    { a[i] = i; }
if (50 > a.size())
    { throw std::out_of_range("oor"); }
...

Also, we know that the int type doesn't have side effects in its destructor and assignment operator. So we can translate the code to the next equivalent:

size_t __tmp = unknown_function();
if (50 > __tmp)
    { throw std::out_of_range("oor"); }
std::vector<int> a(__tmp);
for (int i = 0; i < 50; i++)
    { a[i] = i; }
...

(I'm not sure that such optimization is allowed by C++ standard, because it excludes memory allocation/deallocation steps, but let's think of C++-like language that allows this optimization.)

And, OK, this optimization is not as fast as the next code:

std::vector<int> a(unknown_function());
for (int i = 0; i < 50; i++)
    { a[i] = i; }

because there is an additional check if (50 > __tmp) which you really don't need if you are certainly sure that unknown_function never returns a value that is less than 50. But the performance improvement is not very high in this case.

Please note that my question is little different than this question: Is undefined behavior worth it? That question is: do advantages of performance improvements outweigh shortcomings of undefined behavior. It assumes that undefined behavior really helps to optimize a code. My question is: is it possible to achieve almost the same (maybe little less) level of optimization in a language without undefined behavior as in a language with undefined behavior.

The only case I can think of where undefined behavior can really help improve performance significantly is manual memory management. You never know if the address a pointer points to is not freed. Someone can have a copy of the pointer than call free on it. Your pointer still point to the same address. To avoid this undefined behavior you either have to use a garbage collector (which has its own disadvantages) or have to maintain a list of all pointers that point to the address, and when the address is freed you have to nullify all those pointers (and check them for null before accessing them).

Providing defined behavior for multi-threaded environment may probably cause performance costs too.

PS I am not sure that a defined behavior may be achieved in C-like language, but added it to the tags too.

Your out-of-range examples are not equivalent so no compiler would do as you describe — Support Ukraine, Feb 27 '18 at 06:13
You are overcomplicating. If `i` in `x[i]` ever goes out of bounds, this is UB, so the compiler can assume it never happens **and omit the range check** which languages without UB (e.g. Java) cannot skip (unless the compiler can **prove** it can be skipped). Same thing about dereferencing the null pointer or arithmetic overflow or division by zero and whatnot. — n. m. could be an AI, Feb 27 '18 at 06:21
@4386427 Your comment are not very informative, so not many people will understand what's wrong with my example as you describe. Please write more details. The only difference I can think of is that my last example may exclude memory allocation/deallocation steps. But the result will be the same (Ok, `bad_alloc` may happen, but it could not happen). And what I'm talking is not about C++. I agree, in C++ this optimization is forbidden. But I'm talking about C++-like language but without UB. It would have another standards that would allow this optimization. — anton_rh, Feb 27 '18 at 06:26
Is this question about the specific example, or just in general? This seems a bit broad, and not opinion-based but certainly a question that invites a lot of speculation. — super, Feb 27 '18 at 06:41
What is "**almost** the same performance"? UB are there because they enable something, it is always a trade-off. As such, this really is a duplicate — Passer By, Feb 27 '18 at 06:41
My question is probably too broad and the linked question is probably opinion-based, but they are definitely not duplicate. — anton_rh, Feb 27 '18 at 06:45
Off-topic, but you shouldn't use double underscore for variables, as [they are reserved for compiler's internal use](https://stackoverflow.com/a/224420/6717178) — JHBonarius, Feb 27 '18 at 18:28
@JHBonarius, that is how the compiler would do optimization, it is its internal use, that's why I used double underscore. Compiler can't use normal identifiers because they may conflict with user identifiers. Double underscores here are used intentionally. — anton_rh, Feb 27 '18 at 19:31

score 3 · Answer 1 · answered Mar 02 '18 at 20:39

My question is: is it possible to achieve almost the same (maybe little less) level of optimization in a language without undefined behavior as in a language with undefined behavior.

Yes, by using a type-safe language. Languages such as C and C++ need the concept of undefined behavior precisely because they are not type-safe (which basically means any pointer can point anywhere and anytime), and therefore in way to many cases, the compiler cannot statically prove that no violations of the language specification can occur in any execution of the program, even when that is actually the case. That's because of the hard limitations in pointer analysis. Without undefined behavior, the compiler has to insert too many dynamic checks, most of which are not really needed, but the compiler cannot figure that out.

Consider, for example, safe C# code where a function accepts a pointer to an object of some type (an array). Because of the way the language and the underlying virtual machine are designed, it's guaranteed that the pointer points to the object of the expected type. This is ensured statically. The code emitted by C# still requires bounds and types dynamic checks in certain cases, but compared to C/C++, the number of dynamic checks that would be required to implement fully defined behavior is tiny and typically affordable. Many C# programs can achieve the same as or slightly less than the performance of the corresponding C++ programs. Although that highly depends on how there are compiled.

The only case I can think of where undefined behavior can really help improve performance significantly is manual memory management.

That's not the only case as explained above.

Providing defined behavior for multi-threaded environment may probably cause performance costs too.

Not sure what you mean here. The memory models specified by the language define the behavior of multi-threaded programs. These models can range from very relaxed to very strict (see the C++ memory models for example).

This answer neglects the fact that undefined behavior exists for many other reasons other than a lack of type safety. It leaves a room for useful variations in how the language works. And unlike "implementation defined" behavior, calling something "undefined" leaves room for the compiler provider to not be responsible for documenting things like how possible changes to hardware or operating systems might effect the behavior of the generated code. — mtraceur, Feb 08 '21 at 19:20
That said, I still upvoted - I think this answer does a good job of pointing out the relationship between language constraints, what the compiler can prove, and optimization opportunities. — mtraceur, Feb 08 '21 at 19:21

score 1 · Answer 2 · answered Feb 27 '18 at 07:02

1

For the first example it is NOT obvious it will go out of range, to the compiler; the at() function is a black box and may well add 200 to i before trying to access the vector array. That would be silly, but sometimes programmers are silly. It looks obvious because you're aware that template doesn't do that. If the at() is declared inline then a later peephole optimization stage may do that sort of bounds check skipping, but that's because the function is an opened box at that point so it has access to the vector bounds and that the loop only involves constants.

answered Feb 27 '18 at 07:02

M. Ziegast

165
4

A link-time optimization would probably help. Or at least some functionality could be defined as the part of the language (e.g. array pointers could contain boundaries inside which are checked by compiler, not by standard function). It would make the language more complex, but would provide a more safe and fast generated code. – anton_rh Feb 27 '18 at 07:09
Faster, I agree, but those types of functions were added because too much code was using out of range user inputs as index values, not constants, and accessing outside the object. Similar effects are already plausible using NDEBUG and assert(), or defining alternate's to at() as part of the template, so I doubt there will be more effort along these lines to add language specific features. – M. Ziegast Feb 27 '18 at 07:26
1

`i` is not passed by reference so `at` cannot change it. In addition, it's legal for the compiler to assume that `at` does not change the limits of the vector used in the checks performed by `at`. – Hadi Brais Mar 02 '18 at 19:27
That is due to how that particular at() is implemented; source i may be unchangeable, but the argument to at() can be modified however is desired inside the routine. As to second point, some versions of at() may try to expand the vector if an at(200) is attempted, so the compiler cannot generically assume bounds are constant, that I see. Again, it seems obvious a compiler could assume that, but the upper bound that can be assumed is more INT_MAX, not the constant bounds used in instancing the vector, if the argument to at() is an int type. – M. Ziegast Mar 04 '18 at 01:51
This is only true when LTO is not used, which would be really an artificial restriction. – Hadi Brais Mar 04 '18 at 22:03
LTO applies to .o files created, not .c or .cxx files, so isn't part of compilation of ~sources~. This is not an artificial restriction; it's part of the definition of the languages. Linking is considered a separate phase, with or without LTO, or is done by a separate executable from the compiler or not. If the at() isn't inlined then the .o will generally leave it as an extern still to be resolved, depending which method is used to instance templates, so has to treat it as a black box for any compile time optimizations. – M. Ziegast Mar 05 '18 at 17:09

score 1 · Answer 3 · answered Feb 27 '18 at 18:17

In many cases, optimal code generation will require some constructs via which programmers can invite compilers to assume certain things, with unpredictable consequences if they turn out not to be true. Further, in some cases the most efficient way to perform a task would not be verifiable. If all arrays are marked with length, however, there would be no need to have a language treat out-of-bounds array accesses invoke UB [rather than trapping] if the language had a construct, e.g.

UNCHECKED_ASSUME(x < Arr.length);
Arr[x] += 23;

then it could check array bounds by default without losing the optimizations that would be available using unchecked accesses. To allow for the many cases where it would be necessary to ensure that a program would get shut down before doing anything "bad", but the exact timing of such shutdown wouldn't matter, a language could include a CHECKED_ASSUME assumption, such that given e.g.

CHECKED_ASSUME(x < Arr.length);
Arr[x] += 23;

a compiler would be allowed to fatally trap at any time it could determine that the code would be invoked with x>Arr.length or hit some other fatal trap first. If the above code were to appear within a loop, using a CHECKED_ASSUME rather than an ASSERT would invite a compiler to move the check out of the loop.

While maintainers of C compilers insist that unconstrained UB is necessary for optimization, that would not be true in a well-designed language outside of some narrow circumstances.

I think this answer suggests an interesting place for `assert`. A smart compiler is totally justified as treating an `assert` as an "assume" - an `assert` is essentialy either a checked assume or an unchecked assume depending on how the code is compiled - either way any code that follows can assume the truth of the assertion. I think a C compiler that plays it very safe by default but trusts all `assert`s for optimization could produce competitively efficient code if there were enough asserts everywhere? — mtraceur, Feb 08 '21 at 20:22
@mtraceur: I see very few use cases for a construct which may sometimes behave as "trap if expectation not met" and sometimes as "draw inferences based on a blind assumption that expectation will be met" The message conveyed by "checked assume" is that if a condition isn't met, continued execution will be useless but harmless. An "unchecked assume" would only be appropriate for a condition that a programmer would be certain would be met for all possible inputs a program could receive, without having to test them. By contrast, treating "checked assume" as "trap on failure" would mean... — supercat, Feb 08 '21 at 20:38
...that useless program executions would be terminated sooner rather than later, at the expense of potentially making useful ones run slower. — supercat, Feb 08 '21 at 20:39

score 0 · Answer 4 · answered Feb 27 '18 at 13:16

0

Your example is just an example. In your example, the performance gain of using operator[] instead of at may be small, but there are so many other cases where the performance gain brought by undefined behavior may be huge.

For example, just consider the following code

std::vector<int> a(100);
std::vector<int>::size_type index;
for (int i = 0; i != 100; ++i) {
    std::cin >> index;
    a.at(index) = i;
}

For this code, the compiler has to check the bound in each iteration, which may be a considerable cost.

answered Feb 27 '18 at 13:16

xskxzr

12,442
12
37
77

There is no undefined behavior in the example you provided. The bounds check performed by the `at` function prevents undefined behavior from occurring, so the compiler cannot optimize these checks at all. What point are you trying to make? Also which example are you referring to in "In your example, the performance gain of using operator[] instead of at may be small"? – Hadi Brais Mar 02 '18 at 19:19
@HadiBrais *There is no undefined behavior in the example you provided.* I didn't say there is. The example is used to show a case where if `at` is replaced with `operator[]`, the performance gain (brought by undefined behavior potentially caused by `operator[]` after replacing) is considerable. In addition, I said "the compiler **has to** check..." *Also which example are you referring to...* I mean any example using `at` listed in OP. – xskxzr Mar 03 '18 at 02:48
Replacing `at` with `operator[]` is illegal because it introduces undefined behavior and the compiler cannot really statically check the bounds. Even in general, it does not make sense to have a language that allows the compiler to turn defined behavior code into undefined behavior. It would impossible to write correct code in such language. So irrespective of whether the language supports undefined behavior or not, `at` cannot be optimized to `operator[]` in your example, so there cannot be a performance gain either way. The dynamic checks in each iteration are required. – Hadi Brais Mar 03 '18 at 16:19
@HadiBrais I meant the replacement is not done by the compiler, but by the programmer... In the example in OP, if we replace `at` with `operator[]`, the performance gain is little. As a contrast, in my example, if we replace `at` with `operator[]`, the performance gain would be considerable. So the example in OP is not sufficient to conclude "undefined behavior benefits little". – xskxzr Mar 03 '18 at 17:04
OK, that makes sense now. – Hadi Brais Mar 03 '18 at 17:21

Does undefined behavior really help modern compilers to optimize generated code?

4 Answers4

Linked