4

What are the reasons for C++ to not define some behavior (something like better error checking)? Why don't throw some error and stop?

Some pseudocodes for example:

if (p == NULL && op == deref){
    return "Invalid operation"
 }

For Integer Overflows:

if(size > capacity){
    return "Overflow"
}

I know these are very simple examples. But I'm pretty sure most UBs can be caught by the compiler. So why not implement them? Because it is really time expensive and not doing error checking is faster? Some UBs can be caught with a single if statement. So maybe speed is not the only concern?

merovingian
  • 515
  • 2
  • 9
  • 5
    Because every program would have to pay the cost all the time. Other languages have/are taking this approach and we will see what happens going forward. – Richard Critten Nov 18 '22 at 11:33
  • 3
    C++ assumes that the programmer is a perfect human who never writes code that dereferences a null pointer. This assumption has been violated time and time again, which is why languages like Rust exist :) – Thomas Nov 18 '22 at 11:34
  • 1
    Even the programs which do not have UB? – merovingian Nov 18 '22 at 11:34
  • 2
    @merovingian yes, for example, the compiler would have to insert a not null check before every pointer dereference. And check for data-races would be interesting .. – Richard Critten Nov 18 '22 at 11:35
  • 3
    “But I'm pretty sure most UBs can be caught by the compiler.” — No they can’t, that’s the precise reason why they *aren’t* diagnosed by the compiler. Otherwise you’d be right. – Konrad Rudolph Nov 18 '22 at 11:37
  • How? Can't I implement a compiler that can catch a null dereferencing? Why are you saying that they can't? @KonradRudolph – merovingian Nov 18 '22 at 11:39
  • @merovingian How is the compiler supposed to know whether a pointer contains a valid pointer or not *at runtime*? – Konrad Rudolph Nov 18 '22 at 11:40
  • @merovingian well, simply because you cannot know what value the pointer will have at runtime unless you simulate all possible program states with all possible inputs. – Quentin Nov 18 '22 at 11:40
  • @merovingian `int foo(int* p) { return *p; }` and in a different translation unit `foo(nullptr)`. The cases where the compiler could do it without runtime overhead are the boring ones, the ones that can be catched with not writing obviously wrong code. – 463035818_is_not_an_ai Nov 18 '22 at 11:41
  • @merovingian eg set a global pointer to point somewhere valid. Now call a function in another compilation unit. Now use the global pointer. Where is it pointing? This can't be checked without a runtime check, – Richard Critten Nov 18 '22 at 11:41
  • 4
    You'd need to solve the halting problem to catch all UB. – StoryTeller - Unslander Monica Nov 18 '22 at 11:41
  • @merovingian Well,the good news are, that most modern compilers, can at least give you warnings about such stuff. The rest might be covered by SCA tools. – πάντα ῥεῖ Nov 18 '22 at 11:42
  • 2
    Does this answer your question? [Why does 'undefined behaviour' exist?](https://stackoverflow.com/questions/51557895/why-does-undefined-behaviour-exist) – moooeeeep Nov 18 '22 at 11:42
  • opinion based? certainly not. – 463035818_is_not_an_ai Nov 18 '22 at 11:45
  • 2
    Related: https://stackoverflow.com/q/7237963/72178 – ks1322 Nov 18 '22 at 11:45
  • I know there are a few numbers of UBs in Rust too. But by comparison, Rust has very less UBs compared to cpp. I thought you can also implement this kind of thing in C++ – merovingian Nov 18 '22 at 11:49
  • Rust didn't do it as an afterthought. The language was designed around it, including restrictions that are required to support safe mode. You can't just retrofit it on a language where those restrictions didn't exist. – StoryTeller - Unslander Monica Nov 18 '22 at 11:52
  • Rust works *very* differently to C++. And in many ways it’s unambiguously *better designed* than C++ and is thus able to avoid UB that C++ fundamentally cannot avoid. But of your two examples, one (= integer overflow) is *also* UB in Rust. – Konrad Rudolph Nov 18 '22 at 11:58
  • I agree. But your comment will make a lot of people angry :d @KonradRudolph – merovingian Nov 18 '22 at 11:59
  • afaik every language has constructs that are not defined, only C++ is very specific about listing some of them explcitly. I mean generally anything that isnt defined in the standard is undefined, so even if the standard would not mention undefined behavior there would be plenty – 463035818_is_not_an_ai Nov 18 '22 at 12:00
  • @StoryTeller-UnslanderMonica: Or else specify the set of defined actions in such a way as to facilitate static analysis, as is done with CompCert C. – supercat Nov 21 '22 at 19:41

2 Answers2

9

Because the compiler would have to add these instructions every time you use a pointer. A C++ program uses a lot of pointers. So there would be a lot of these instructions for the computer to run. The C++ philosophy is that you should not pay for features you don't need. If you want a null pointer check, you can write a null pointer check. It's quite easy to make a custom pointer class that checks if it is null.

On most operating systems, dereferencing a null pointer is guaranteed to crash the program, anyway. And overflowing an integer is guaranteed to wrap around. C++ doesn't only run on these operating systems - it also runs on other operating systems where it doesn't. So the behaviour is not defined in C++ but it is defined in the operating system.


Nonetheless, some compiler writers realized that by un-defining it again, they can make programs faster. A surprising amount of optimizations are possible because of this. Sometimes there's a command-line option which tells the compiler not to un-define the behaviour, like -fwrapv.

A common optimization is to simply assume the program never gets to the part with the UB. Just yesterday I saw a question where someone had written an obviously not infinite loop:

int array[N];
// ...
for(int i = 0; i < N+1; i++) {
    fprintf(file, "%d ", array[i]);
}

but it was infinite. Why? The person asking the question had turned on optimization. The compiler can see that there is UB on the last iteration, since array[N] is out of bounds. So it assumed the program magically stopped before the last iteration, so it didn't need to check for the last iteration and it could delete the part i < N+1.

Most of the time this makes sense because of macros or inline functions. Someone writes code like:

int getFoo(struct X *px) {return (px == NULL ? -1 : px->foo);}

int blah(struct X *px) {
    bar(px->f1);
    printf("%s", px->name);
    frobnicate(&px->theFrob);
    count += getFoo(px);
}

and the compiler can make the quite reasonable assumption that px isn't null, so it deletes the px == NULL check and treats getFoo(px) the same as px->foo. That's often why compiler writers choose to keep it undefined even in cases where it could be easily defined.

user253751
  • 57,427
  • 7
  • 48
  • 90
  • So okay. As I understand, big compilers like gcc are making the error checking that is necessary for all the programs? So there is no additional error checking that would cause some overhead? – merovingian Nov 18 '22 at 11:46
  • @merovingian No? What? – user253751 Nov 18 '22 at 11:58
  • You're saying that checking these kinds of things is costly. So I'm saying, there is no additional error checking, hence when we make mistakes, there are UBs? – merovingian Nov 18 '22 at 12:00
  • 1
    @merovingian with optimization turned off - yes. Note that it is only UB in C++. Other parts of the system might define it, but C++ doesn't define it. With optimization turned on - the optimizer is allowed to make weird assumptions. A common optimization thing is the optimizer assumes the program never gets to the part where the UB happens. There was a question just the other day where someone wrote a loop but the 100th iteration of the loop had UB (out of bounds array access) and then the optimizer assumed the loop would magically stop before that so it didn't need to check when it would stop – user253751 Nov 18 '22 at 12:03
  • Many tasks could be performed more efficiently by a compiler that is allowed to deviate *slightly* from the behavior that would result from treating a program as a sequence of steps performed in order using the underlying platform's semantics, than my one which followed such semantics slavishly, but only if the alternative behavior would satisfy application requirements just as well as the original. If a program wouldn't care what numeric value a computation yields when given invalid input, an implementation that replaces `x*30/15` with `x*2` may improve performance while still satisfying... – supercat Nov 21 '22 at 19:36
  • ...requirements. An implementation that would "optimize" `if (x < 100000000) arr[x] = 123;` with `arr[x] = 123;` on the basis that it's allowed to do anything it likes if `x*30` exceeds `INT_MAX`, however, would not meet requirements. Unfortunately, some compiler vendors refuse to recognize any distinction between those two kinds of optimizations. – supercat Nov 21 '22 at 19:38
5

Compilers already have switches to enable those checks (those are called sanitizers, e.g. -fsanitize=address, -fsanitize=undefined in GCC and Clang).

It doesn't make sense for the standard to always require those checks, because they harm performance, so you might not want them in release builds.

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207