149

GCC 6 has a new optimizer feature: It assumes that this is always not null and optimizes based on that.

Value range propagation now assumes that the this pointer of C++ member functions is non-null. This eliminates common null pointer checks but also breaks some non-conforming code-bases (such as Qt-5, Chromium, KDevelop). As a temporary work-around -fno-delete-null-pointer-checks can be used. Wrong code can be identified by using -fsanitize=undefined.

The change document clearly calls this out as dangerous because it breaks a surprising amount of frequently used code.

Why would this new assumption break practical C++ code? Are there particular patterns where careless or uninformed programmers rely on this particular undefined behavior? I cannot imagine anyone writing if (this == NULL) because that is so unnatural.

boot4life
  • 4,966
  • 7
  • 25
  • 47
  • 2
    http://stackoverflow.com/a/1844012/1870760 is a good read. If your compiler allows you to then you can make assumptions off of that. – Hatted Rooster Apr 27 '16 at 14:47
  • Yes, some developers have code which assumes this can be null pointer. For example, a library can consistently set objects to nullptr after deleting, and than call some functions of those deleted objects. – SergeyA Apr 27 '16 at 14:51
  • Possible duplicate of [Checking if this is null](http://stackoverflow.com/questions/1844005/checking-if-this-is-null) – RiaD Apr 27 '16 at 14:52
  • 1
    This is not the only example of GCC doing such things either. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475 – Ben Apr 27 '16 at 14:54
  • 1
    [This article](http://www.viva64.com/en/b/0226/) provides a good explanation why such code bases are broken. Qt 5 codebase is unfortunately a bit broken in this respect, but hopefully this will be fixed in due time. – Kuba hasn't forgotten Monica Apr 27 '16 at 14:54
  • 13
    Remember that this doesn't only affect code that has `if(this == 0) { ... }` but also code that passes `this` to code that does `if(ptr == 0)`, by inlining. I did hit this in Qt when I called `obj->deleteLater()` on a null pointer, which calls [QCoreApplication::postEvent](https://code.woboq.org/qt5/qtbase/src/corelib/kernel/qcoreapplication.cpp.html#1364), passing it the null `this`. If this function would be inlined (unlikely for `postEvent`), a warning may turn into a crash. Careless programmers may think `ptr->deleteLater()` behaves like `delete ptr;` in this regard and fall into the trap. – Johannes Schaub - litb Apr 27 '16 at 15:00
  • 22
    @Ben Hopefully you mean it in a good way. Code with UB should be rewritten not to invoke UB. It's as simple as that. Heck, there are often FAQs that tell you how to achieve it. So, not a real issue IMHO. All good. – Kuba hasn't forgotten Monica Apr 27 '16 at 15:02
  • 3
    @KubaOber, No: J'accuse. The existing codebase is too large for "fix your code" to be the way forward. The only way forward is to define the undefined behaviour to reify the assumptions made in existing code. – Ben Apr 27 '16 at 15:04
  • 4
    @KubaOber, in other words, the answer must be to remove UB from the standard, and to consider the standard broken until that is done. Clearly a broken standard should be fixed, not implemented as-is! .... Using the broken parts to justify sabotaging existing code is not something anyone should be allowed to get away with. It's so obviously wrong, it must be deliberate. – Ben Apr 27 '16 at 15:10
  • 4
    Please clarify your question. The title asks why GCC wants to break practical C++ code, and the bold question in the body asks why projects would do `this == null` (?). Those are separate questions. – Johannes Schaub - litb Apr 27 '16 at 15:10
  • 52
    I am amazed to see people defending dereferencing null pointers in the code. Simply amazing. – SergeyA Apr 27 '16 at 15:16
  • 5
    @sergeyA Nobody is doing that. We are saying the compiler shouldn't remove a null check just because the pointer is `this`. The null check is there to avoid dereferencing null. The excuse is "`this` cannot be null", yet of course ***in reality it can***. See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475 where GCC removes an integer overflow check - the excuse? Integer overflow is undefined, so you don't need to check for it. ***And yet it happens*** so you ***do*** need to check for it. – Ben Apr 27 '16 at 15:22
  • 19
    @Ben, exploting undefined behavior has been the very effective optimization tactic for a very long time. I love it, because I love optimizations which make my code run faster. – SergeyA Apr 27 '16 at 15:27
  • 17
    I agree with SergeyA. The whole brouhaha started because people seem to dwell on the fact that `this` is passed as an implicit parameter, so they then start using it just as if it was an explicit parameter. It's not. When you dereference a null this, you're invoking UB just as if you dereferenced any other null pointer. That's all there's to it. If you want to pass nullptrs around, **use an explicit parameter, DUH**. It won't be any slower, it won't be any clunkier, and the code that has such API is deep in the internals anyway, so has very limited scope. End of story I think. – Kuba hasn't forgotten Monica Apr 27 '16 at 19:10
  • 16
    @Ben There's a perfectly valid way of writing integer overflow checks ([**FAQ**](http://c-faq.com/misc/intovf.html)), it just so happens that the people whose code got "broken" by gcc improvements didn't know their C to begin with. There's an absurd amount of bad C/C++ out there. That doesn't mean that the compilers should bend backwards to support it. Such code should be eradicated, and the compilers should push us towards that goal... – Kuba hasn't forgotten Monica Apr 27 '16 at 19:14
  • 3
    @JohannesSchaub-litb Calling a method on a null pointer doesn't make any sense, since `foo->deleteLater()` when `foo` is null is already UB, before we get to any checks that might happen inside of `deleteLater`. A non-static method call cannot be a drop-in replacement for a delete expression! If you want to, you can implement a free-standing `void deleteLater(QObject * o) { if (o) o->deleteLater(); }`. That's valid and IMHO the only sensible solution to the problem you describe. – Kuba hasn't forgotten Monica Apr 27 '16 at 19:37
  • 6
    "Why would this new assumption break practical C++ code?" - Not an answer to your specific question, but [this blog post from an LLVM/Clang developer](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html) gives great insight into how optimizing compilers exploit Undefined Behaviour for (significant) performance gains, and how this can lead to nasty surprises for even experienced programmers. – marcelm Apr 27 '16 at 20:11
  • 43
    Kudos to GCC for breaking the cycle of bad code -> inefficient compiler to support bad code -> more bad code -> more inefficient compilation -> ... – M.M Apr 28 '16 at 00:11
  • This should also mean that `typeid(*this)` becomes effectively "nothrow" in GCC. So far it could throw bad_typeid, I guess – Johannes Schaub - litb Apr 28 '16 at 09:53
  • 9
    It's been noted that `this` is a pointer only because references didn't exist when `this` was introduced in C++. GCC already knows references can't be `null`; this is just GCC being consistent. – MSalters Apr 28 '16 at 10:31
  • 1
    @KubaOber Yes, I wish the best to Qt in sanitising their codebase. They've been doing a bit better lately in modernising with C++11 and 14 features - if only their tutorials weren't woefully out of date, e.g. indicating all the `SIGNALS/SLOTS` macro rubbish is still required, when it isn't. Stylistic choices I don't like - and UB that I won't put near my programs - are stopping me from trying it. I'll stick to `gtkmm`, thanks! No macros and far more enthusiastic adoption of C++11/14. – underscore_d Apr 28 '16 at 11:56
  • @KubaOber *"Calling a method on a null pointer doesn't make any sense, since `foo->deleteLater()` when `foo` is null is already UB"* As far as I understand [LWG 315](http://wg21.cmeerw.net/cwg/issue315), this at one time was *not* intended to be UB, but the wording seems to not have changed (it still being UB according to [class.mfct.non-static]). – dyp Apr 28 '16 at 13:36
  • 4
    @dyp It's not generally possible to make `foo->method()` not be UB if `foo` is null. `method()` might be virtual, or might be from a base class and while `foo` is of a derived class and `method()` will receive an adjusted `this` that has a value of a small negative `intptr_t`, and so on. It doesn't make any sense whatsoever to attempt to make it not UB in some special cases. Whatever the intentions were, they were sanely abandoned. – Kuba hasn't forgotten Monica Apr 28 '16 at 13:48
  • @KubaOber D'oh I've misread CWG 315 again (and called it an LWG defect)... It's about calling *static* member functions, not *non-static* member functions. Sorry for the confusion. – dyp Apr 28 '16 at 20:02
  • 2
    @Ben: If you get to a point where `this` is null, it's too late: you've _already_ done something wrong -- in the examples people have given, it's because you've already dereferenced a null pointer, and for whatever reason, the compiler decided to turn that into a method call with a null `this` rather than have some other effect. –  Apr 29 '16 at 16:07
  • 2
    @Hurkyl, C++ was standardised in 1998. A lot of this code was written before that, and had to target the compiler not the standard - **because there was no standard back then**. After 1998 it was years before ANY compiler implemented the standard. A lot of that code is still running today with minor updates, just ports to new architectures. – Ben Apr 29 '16 at 16:19
  • 2
    For example Qt dates from 1991. Chrome is from KHTML which was released in 1997. These idioms are baked into an enormous amount of software and are not going to go away. The failure of the standard to accept actually-existing, working C++ code as valid is why the standard is broken. – Ben Apr 29 '16 at 16:45
  • 4
    So tell me again how any programmer who invokes UB is an incompetent, unworthy, rotten person, and deserves to be run over by a train. Explain again - for my boss this time - how we need to spend months and $10Ks to rewrite 200kloc of working C++ or we deserve to be repeatedly punched in the face by the GCC team. BEAT ME AGAIN! I WANT TO BE PURE! ........ Or, we could... just not use a compiler written by sadists for masochists. – Ben Apr 29 '16 at 16:56
  • 9
    @Ben: That's right -- you can continue using legacy compilers (or even modern compilers with the right flags!) to compile your ancient nonconforming code. Thus, you offer no good reason why things should be ruined for those of us who are writing standard-conforming code. –  Apr 29 '16 at 17:48
  • @Ben: Do you know if there has ever been discussion about changing the standard to make dereferencing a null pointer to call a non-virtual non-static member function a well-defined thing? –  Apr 29 '16 at 19:30
  • 3
    @Hurkyl, I have no idea. Other languages do it though. And you are welcome to your dysfunctional relationship with gcc (he only hits you because he loves you, keep telling yourself that) but I choose to use compilers which don't keep changing the rules and blaming me. (You know new iterations of the standard *add* UB? This is why I don't write new code in C++) I still have to maintain old stuff though. Meanwhile people switch to CLang because "the mysterious crashing bugs just go away". CLang doesn't make you hunt for the new "don't randomly crash" switch every release. Is there 5 now for GCC? – Ben Apr 29 '16 at 19:50
  • @Ben: I keep reading about how adding new forms of UB is supposed to "improve performance", but the language used in the 1990s, which interpreted the pointer-type rules as being applicable only to named variables, and excluded arrays that were never accessed by name (but created solely for use as memory pools) made it possible to operate on data with two- and four-byte chunks without regard for the type of data in question, without the compiler having to assume that every such access might hit every named object whose address has been exposed to outside code. Saying that code which wants... – supercat May 03 '16 at 06:36
  • ...to use type-agnostic access must either use character types or memcpy, both of which are downright horrible from an aliasing perspective, doesn't seem like a performance win to me. – supercat May 03 '16 at 06:37
  • 4
    @Ben "For example Qt dates from 1991. Chrome is from KHTML which was released in 1997. These idioms are baked into an enormous amount of software and are not going to go away." - except that they _totally will_ if the coders fix their codebases to avoid ub. if you don't want to, then whatever, but don't paint this as some unavoidable fact of life when it's due purely to inertia or active refusal to follow the standard – underscore_d Jul 11 '16 at 12:35
  • @underscore_d You've just totally missed the point. The standard didn't have to be like that: that behaviour could have been defined, or made implementation defined. That was a decision, and it was a bad one. The standard is wrong and should be changed. – Ben Jul 11 '16 at 14:31
  • While I can't really say much here, since I was just a little kid back before C++ was standardised and didn't even know the language existed, I think that changes like this should initially be implemented as opt-in instead of opt-out, and in the case of optimisations that make assumptions like this, a standalone utility should be provided that can be used to analyse the code and emit warnings whenever it encounters a situation that might cause trouble if the optimisation was enabled. This lets people find out what needs to be changed _before_ it screws them over, and indicates the – Justin Time - Reinstate Monica Oct 19 '16 at 17:35
  • likelihood that it would cause problems for them if they didn't modify their code. Later versions would then change it from opt-in to opt-out, after giving people around 6-12 months to get used to it (and to provide ample time for old libraries to be rewritten). They could also provide a pragma that can be used to disable that particular optimisation for a block of code, so that libraries can indicate which part(s) of their code base is/are still liable to cause problems; ideally, this pragma would automatically generate a warning. – Justin Time - Reinstate Monica Oct 19 '16 at 17:39
  • 1
    Also, [this](http://ideone.com/yuP0Go) works on Clang, GCC, and MSVC. Just wanted to put that out there. – Justin Time - Reinstate Monica Oct 19 '16 at 17:45
  • 1
    @JustinTime: The proper approach should be to define directives that would indicate that a program does not rely upon certain constructs, and suggest that quality code should use that directive whenever possible. Javascript took that approach with its "strict" dialect: if a program starts with the string literal "use strict", it will be processed with tighter scoping rules that are safer and allow more optimizations than would otherwise be possible, but would be incompatible with some existing code. Rather than trying to argue about whether a piece of code should be supported, ... – supercat Jun 20 '18 at 16:46
  • 1
    ...programmers and compiler writers should have agreed that if code includes a directive saying "I need these semantics", it would be dumb to optimize on the assumption that code doesn't, while also agreeing quality code should include a directive indicating whether such semantics are required. If code includes no directives either way, compiler writers could use their judgment of the intended audience in choosing a default behavior. – supercat Jun 20 '18 at 16:50
  • Agreed, @supercat. Considering how many old codebases use techniques and practices that later became UB, it would probably be _very_ useful to have a way to programmatically tell the compiler "we're good, optimise away" or "we're not ready yet," from within the source code itself. ...It would also have the additional benefit of reminding anyone who opens the file that the code is outdated and in need of refactoring and/or sanitising, which is a plus. – Justin Time - Reinstate Monica Jul 16 '18 at 19:02
  • 1
    @JustinTime: Note that UB was *never* intended to encourage nasal demons. Looking at the rationale, the intention was that quality implementations for various targets and purposes would define behaviors appropriate for those targets and purposes. The authors of the Standard openly recognize that it would be possible for a "conforming" implementation to be of such poor quality as to be useless, and the fact that the Standard would allow a *conforming* compiler to do something in no way implies that such action would not render a compiler unsuitable for some purposes. – supercat Jul 16 '18 at 19:09
  • 1
    @JustinTime: It would be fun to give advocates of compiler lunacy some simple code and ask them to rewrite it in a fashion that's Strictly Conforming given the way 6.5p7 is written (avoiding constructs which non-garbage implementations obviously *should* treat as defined, but which the Standard doesn't actually allow). Under a hyper-pedantic reading, given `int i=0;`, even something like `i=1;` since the lvalue `i` by itself doesn't modify `i`, and the assignment expression isn't an lvalue of a suitable type because it isn't an lvalue. Only a problem with obtuse compilers, of course... – supercat Jul 16 '18 at 19:16
  • 1
    ...but I'd say the same about a lot of constructs which get broken under "optimization". – supercat Jul 16 '18 at 19:16

5 Answers5

90

I guess the question that needs to be answered why well-intentioned people would write the checks in the first place.

The most common case is probably if you have a class that is part of a naturally occurring recursive call.

If you had:

struct Node
{
    Node* left;
    Node* right;
};

in C, you might write:

void traverse_in_order(Node* n) {
    if(!n) return;
    traverse_in_order(n->left);
    process(n);
    traverse_in_order(n->right);
}

In C++, it's nice to make this a member function:

void Node::traverse_in_order() {
    // <--- What check should be put here?
    left->traverse_in_order();
    process();
    right->traverse_in_order();
}

In the early days of C++ (prior to standardization), it was emphasized that that member functions were syntactic sugar for a function where the this parameter is implicit. Code was written in C++, converted to equivalent C and compiled. There were even explicit examples that comparing this to null was meaningful and the original Cfront compiler took advantage of this too. So coming from a C background, the obvious choice for the check is:

if(this == nullptr) return;      

Note: Bjarne Stroustrup even mentions that the rules for this have changed over the years here

And this worked on many compilers for many years. When standardization happened, this changed. And more recently, compilers started taking advantage of calling a member function where this being nullptr is undefined behavior, which means that this condition is always false, and the compiler is free to omit it.

That means that to do any traversal of this tree, you need to either:

  • Do all of the checks before calling traverse_in_order

    void Node::traverse_in_order() {
        if(left) left->traverse_in_order();
        process();
        if(right) right->traverse_in_order();
    }
    

    This means also checking at EVERY call site if you could have a null root.

  • Don't use a member function

    This means that you're writing the old C style code (perhaps as a static method), and calling it with the object explicitly as a parameter. eg. you're back to writing Node::traverse_in_order(node); rather than node->traverse_in_order(); at the call site.

  • I believe the easiest/neatest way to fix this particular example in a way that is standards compliant is to actually use a sentinel node rather than a nullptr.

    // static class, or global variable
    Node sentinel;
    
    void Node::traverse_in_order() {
        if(this == &sentinel) return;
        ...
    }
    

Neither of the first two options seem that appealing, and while code could get away with it, they wrote bad code with this == nullptr instead of using a proper fix.

I'm guessing that's how some of these code bases evolved to have this == nullptr checks in them.

αλεχολυτ
  • 4,792
  • 1
  • 35
  • 71
jtlim
  • 3,861
  • 1
  • 16
  • 14
  • 1
    Sorry, the wording is vague. I meant introducing that line would now make the whole function undefined behavior. Im not defending the use of nullptr checks in this. The OP was asking how code could end up having it. – jtlim Apr 27 '16 at 15:11
  • @jtlim huh, no that line doesn't cause undefined behavior at all. If anything, it would prevent it (kind of) – Johannes Schaub - litb Apr 27 '16 at 15:12
  • @JohannesSchaub-litb How is the `this == nullptr` check not undefined behavior? If you write this in the latest clangs, it'll completely omit the check and you'll probably crash with a memory access violation. – jtlim Apr 27 '16 at 15:13
  • 6
    How can `1 == 0` be undefined behavior? It's simply `false`. – Johannes Schaub - litb Apr 27 '16 at 15:13
  • 11
    The check itself is not an undefined behaviour. It's just always false, and thus eliminated by the compiler. – SergeyA Apr 27 '16 at 15:14
  • 15
    Hmm.. `this == nullptr` idiom is undefined behavior because you've called a member function on a nullptr object before that, which is undefined. And the compiler is free to omit the check – jtlim Apr 27 '16 at 15:15
  • Since such checks belong in low-level resource-managing code, such as containers, etc., it's OK to write them out explicitly, like in C. – Kuba hasn't forgotten Monica Apr 27 '16 at 15:25
  • 2
    In fact this was once defined. All the 1990s manuals said that a non-virtual method call would succeed correctly and you could put if (this == NULL) inside it. – Joshua Apr 27 '16 at 15:41
  • 6
    @Joshua, the first standard was published in 1998. Whatever happened before that was whatever every implementation wanted. Dark ages. – SergeyA Apr 27 '16 at 16:02
  • 6
    @jtlim: _"I meant introducing that line would now make the whole function undefined behavior"_ It always was. The entire program has undefined behaviour. It always has. I think you're misunderstanding what "undefined behaviour" means! – Lightness Races in Orbit Apr 27 '16 at 22:42
  • 1
    Calling after checking if the object is valid is fine. You should never ever call a function on an invalid object anyway! So first solution is completely OK as is the third one. – Kaveh Apr 28 '16 at 00:31
  • @Joshua It may be undefined behaviour in the standard, but that doesn't prevent individual implementations from defining it. GCC chooses not to define it, however. – user253751 Apr 28 '16 at 01:09
  • 27
    Heh, wow, I can't believe anyone ever wrote code that relied on calling instance functions... _without an instance_. I would have instinctively used the excerpt marked "Do all of the checks before calling traverse_in_order", without even thinking about `this` ever being nullable. I guess maybe this is the benefit of learning C++ in an age where SO exists to entrench the perils of UB in my brain and dissuade me from doing bizarre hacks like this. – underscore_d Apr 28 '16 at 11:03
  • @SergeyA Prior to the publication of an official standard, it was a *de facto* standard. – Barmar May 03 '16 at 18:17
  • 3
    Checking for zero is more efficient than checking for `== &sentinel`. On x86, it will slightly increase code-size to use `cmp r64, imm32` instead of `test r64,r64`, but on many other ISAs (like ARM or MIPS) you'd have to load a 32-bit constant into a register and then compare against it. So it takes an extra couple instructions and uses up a register. – Peter Cordes Sep 02 '17 at 06:37
  • 1
    Great answer except for the recommendation to use a sentinel element. That's just weird; besides the performance downside (https://godbolt.org/g/cni4LV x86 and ARM64 asm output), I'd wonder why the code was checking against a sentinel. I'd assume it must have a reason for not using `nullptr`, and I wouldn't likely guess that the reason was just to save a couple lines of code in a couple functions! So you'd need a comment to justify that hack, which kind of defeats the purpose of making the code "cleaner". – Peter Cordes Sep 02 '17 at 06:44
  • 2
    Also, in a Linux shared library, getting `&sentinel` may require a load from the GOT, since it's not even a link-time constant. (If the sentinel is `static`, it can use an x86-64 RIP-relative LEA, though.) Anyway, this is clearly much worse than `nullptr`. – Peter Cordes Oct 02 '17 at 16:30
  • @underscore_d: One of the main advantages of C++ over C was that the fact that each pointer type had its own "namespace" of associated functions. Being able to associate a function with a `T*`, rather than type `T` was a useful feature C++ used to support. – supercat Jun 20 '18 at 22:46
  • @supercat I see no "was" about it; you can still declare as many functions as you want with the same name but different argument types, using overloading. It just is wrong to use classes to do so, in this case, since those require a valid instance at all times. – underscore_d Jun 21 '18 at 14:29
  • @underscore_d: While it may be syntactically legal to use overloading for that purpose, proper convention in languages that support overloading is generally to have overloads behave in somewhat-related fashion. Further, the value of being able to use instance-pointer-first syntax (in languages that support it) has increased with the addition of editor features like auto-complete. Having text early in a line indicate what pool of identifiers to draw from is helpful, and there was no reason the feature could not continue to have been supported. – supercat Jun 21 '18 at 14:53
67

It does so because the "practical" code was broken and involved undefined behavior to begin with. There's no reason to use a null this, other than as a micro-optimization, usually a very premature one.

It's a dangerous practice, since adjustment of pointers due to class hierarchy traversal can turn a null this into a non-null one. So, at the very least, the class whose methods are supposed to work with a null this must be a final class with no base class: it can't derive from anything, and it can't be derived from. We're quickly departing from practical to ugly-hack-land.

In practical terms, the code doesn't have to be ugly:

struct Node
{
  Node* left;
  Node* right;
  void process();
  void traverse_in_order() {
    traverse_in_order_impl(this);
  }
private:
  static void traverse_in_order_impl(Node * n)
    if (!n) return;
    traverse_in_order_impl(n->left);
    n->process();
    traverse_in_order_impl(n->right);
  }
};

If you had an empty tree (eg. root is nullptr), this solution is still relying on undefined behavior by calling traverse_in_order with a nullptr.

If the tree is empty, a.k.a. a null Node* root, you aren't supposed to be calling any non-static methods on it. Period. It's perfectly fine to have C-like tree code that takes an instance pointer by an explicit parameter.

The argument here seems to boil down to somehow needing to write non-static methods on objects that could be called from a null instance pointer. There's no such need. The C-with-objects way of writing such code is still way nicer in the C++ world, because it can be type safe at the very least. Basically, the null this is such a micro-optimization, with such narrow field of use, that disallowing it is IMHO perfectly fine. No public API should depend on a null this.

Community
  • 1
  • 1
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • 1
    In theory, but not in practice. In practice it worked fine in the absence of multiple inheritance provided the base has a vptr - so fine that a long list of actually-existing libraries and applications use the technique, from Qt to MFC, to Chromium. These will never be fixed, and turning a harmless assumption into a vulnerability is simply deliberate sabotage. – Ben Apr 27 '16 at 15:03
  • 18
    @Ben, Whoever wrote this code was wrong in the first place. It is funny that you are naming such terribly broken projects as MFC, Qt and Chromium. Good riddance with them. – SergeyA Apr 27 '16 at 15:12
  • 7
    @Ben A null `this` doesn't work on simple one-deep single inheritance with no virtual methods. It's bad code, that's what it is. If you want it to work, you need a static method that takes the instance pointer, checks it against null, and proceeds appropriately. This makes your intent explicit, and that's what the standard was aimed for, and that's a good thing. The standard forces you to reveal your hand, it doesn't make it impossible to use the technique. – Kuba hasn't forgotten Monica Apr 27 '16 at 15:13
  • 3
    Good riddance to Chromium? Which is on a billion Android devices? Unlikely. More likely this introduces vulnerabilities which did not previously exist into billions of Android devices. ... The existing codebase is too large for "fix your code" to be a solution. The solution is to *fix the standard*. The standard is broken. Relying on the broken standard as an excuse for breaking previously working code is just sabotage. – Ben Apr 27 '16 at 15:18
  • 19
    @Ben, terrible coding styles in Google are well known to me. Google code (at least publicly available) is often badly written, despite multiple people believing Google code to be the shining example. May be this will make them to revisit their coding styles (and guidelines while they are on it). – SergeyA Apr 27 '16 at 15:21
  • 19
    @Ben No one is retroactively replacing Chromium on these devices with Chromium compiled using gcc 6. Before Chromium will be compiled using gcc 6, and other modern compilers, it will need to be fixed. It's not a huge task either; the `this` checks are picked out by various static code analyzers, so it's not as if anyone has to manually hunt them all down. The patch would be probably a couple hundred lines of trivial changes. – Kuba hasn't forgotten Monica Apr 27 '16 at 15:22
  • 3
    You know as well as I do that there are many other codebases which will be broken which don't receive the same attention as Chromium. They will simply be recompiled with whatever compiler is on the build machine ready for the next release. **The standard is broken.** The standard needs to be fixed to emphasise safety, and most UB should be IB. – Ben Apr 27 '16 at 15:28
  • 9
    @Ben In practical terms, a null `this` dereference is an instant crash. These problems will be found out very quickly even if nobody cares to run a static analyzer over the code. C/C++ follows the "pay only for features you use" mantra. If you want checks, you must be explicit about them and that means not doing them on `this`, when it's too late, since the compiler assumes `this` isn't null. Otherwise it'd have to check `this`, and for 99.9999% of code out there such checks are a waste of time. – Kuba hasn't forgotten Monica Apr 27 '16 at 15:32
  • If you had an empty tree (eg. root is nullptr), this solution is still relying on undefined behavior by calling `traverse_in_order` with a nullptr. – jtlim Apr 27 '16 at 15:42
  • 6
    You'd have to check the root before calling ``traverse_in_order`` then, which doesn't make this solution invalid because the call to the function is not included in the answer. – Patrick Roberts Apr 27 '16 at 15:50
  • Yes, but it means that in practical terms, the calling code *is* going to be littered with extra checks. ie. uglier. Which is what this solution is trying to prove false. – jtlim Apr 27 '16 at 15:51
  • 1
    @KubaOber, I understand removing the *implicit check for null on a cast* where the pointer is *immediately dereferenced*, but removing an explicit check is a different thing altogether. – Ben Apr 27 '16 at 15:53
  • 1
    @Ben It won't ever work if you and the compiler assume different things. The compiler assumes that `this` isn't null. It doesn't make any sense for you to check it, so the compiler is free to remove your checks. Remember that an arbitrarily complex piece of code you didn't explicitly write can reduce to a check for `this == nullptr`, and such code is then optimized out. You would need a way to tell the compiler that `this` sometimes isn't null, and the way to do that certainly isn't by checking for a null `this`. It'd need to be a method attribute, and would be implementation specific. – Kuba hasn't forgotten Monica Apr 27 '16 at 18:59
  • 11
    my advice for anyone who thinks the standard is broken: use a different language. There is no shortage of C++-like languages that don't have the possibility of undefined behaviour. – M.M Apr 28 '16 at 00:24
  • 6
    @Ben You will have to explain to me more slowly why a standard that prohibits this==nullptr is broken, and not the code (no matter how prevalent) that has this==nullptr. There was an awful lot of code that turned out to be vulnerable to buffer overflow, but the fact that this bad programming practice was ubiquitous (e.g., in SQL Server) doesn't make it something the standard should be rewritten to permit. – Andrew Lazarus May 03 '16 at 22:49
  • @AndrewLazarus: There are times when it is useful to have a syntax to take a pointer-type rvalue and pass that as an argument to a method associated with the pointer's target type. Whether or not you like the use of `ptr->method()` syntax for that, I can't think of any alternative that isn't substantially more verbose. IMHO, the right solution would be to have a means of indicating that particular functions must be invokable with a null "this", and then having behavior be defined for those functions. – supercat May 09 '16 at 19:59
  • 1
    @supercat Sounds like exactly the use case for the (rare and awkward-to-type) pointer-to-member syntax, which is really just the member's offset. – Andrew Lazarus May 10 '16 at 15:42
  • @AndrewLazarus: Do you mean `(*ptr).member()`? That would seem even more UB-ish. To my mind, if `bar` is not a virtual member, `foo->bar()` means "call function bar() of foo's type, passing `foo` as an in implied parameter, and does not imply actually dereferencing `foo`, while `(*ptr).member()` looks a lot more like a dereference. – supercat May 10 '16 at 15:48
  • @AndrewLazarus: What might have been nicest would be if there were a base-class syntax for functions which would need to be invokable on null objects. If such syntax existed, it could be supported for even virtual functions (with the proviso that the compiler would be required to do what was needed to ensure that invocation with null called the base-class method). Such a feature shouldn't hurt code efficiency if programmers only write such a declarations in places where a type would need to be able to accept invocation with null. – supercat May 10 '16 at 15:51
  • 3
    @supercat No, I mean `Foo::*member` (see "pointers to data members" section at http://en.cppreference.com/w/cpp/language/pointer ) – Andrew Lazarus May 10 '16 at 15:52
  • @AndrewLazarus: That still seems a bit awkward. I'm more a .NET programmer than C++, but I would think that in C++ as in .NET there would be times when it would be useful to treat pointers to an immutable type as though they were values of that type, and have a null pointer represent a particular value [e.g. for a type representing strings, have a null pointer represent an empty string]. While it would be possible to have the type itself be a structure that contains a pointer, that would require that the type be responsible for its own lifetime management. – supercat May 10 '16 at 16:52
  • @AndrewLazarus: While C++ certainly did not end up evolving in a direction to favor the concept of treating pointers as proxies for the values, with null pointers representing some commonplace value, that doesn't mean that wouldn't have been a useful concept to support. – supercat May 10 '16 at 16:58
  • @supercat The problem with null pointers having special meaning is that you have to duplicate null pointer checks *everywhere* a pointer is used. Normally, an empty string check is done only in the callee (somewhere in `std::string`), but here you'd have checks in every caller. All this extra code would waste the limited code cache and would be bad news. I think it's a very bad idea. For that very same reason it is a premature pessimization to assume that `this` might be null. If you need that trick, you can live with C-style code. It'll be very limited in scope, your users should never see it – Kuba hasn't forgotten Monica May 10 '16 at 18:02
  • 1
    @supercat "[e.g. for a type representing strings, have a null pointer represent an empty string]" — this was a major misfeature of some versions/preference settings of Microsoft Access. Having a NULL middle name field suggests no one has initialized the record; having an empty-string field says you don't have a middle name. Sentinel objects are very handy for this sort of special value, and don't need NULL checks. – Andrew Lazarus May 10 '16 at 18:21
  • @KubaOber: Having a null pointer represent an empty string (as Microsoft's Common Object Model does, btw) eliminates the need to construct and destroy empty strings, and it means that one can populate an array of 1,000 strings merely by zeroing it out, as opposed to having to construct 1,000 different null-string objects. – supercat May 10 '16 at 18:42
  • 2
    @supercat An a string could work such that an empty one is obtained by zeroing out the object's fields, this can be had without null pointers. If one doesn't care about the small string optimization, one can certainly have the string instance be the size of a pointer, and internally *be* a pointer to a PIMPL. Since you really shouldn't be using strings any other way but as values, what magic exactly goes inside of a string is irrelevant, they can simply be PIMPL pointers. You don't need pointers to them in user code, of course. `sizeof(std::string)==sizeof(void*)` is OK, iff slow-ish IRL. – Kuba hasn't forgotten Monica May 10 '16 at 19:24
  • Strings can initialize the pointer with the address of a shared null; setting that costs no more than zeroing. Even with small string optimization, a null string constructor will be setting one word to zero (the pointer). The overhead of an array of 1000 such strings is the cache churn due to having to skip uninitialized short string buffers. But the strings will perform better, and I presume you mostly care about strings that do something, not about strings that do nothing. – Kuba hasn't forgotten Monica Jun 15 '17 at 20:33
36

The change document clearly calls this out as dangerous because it breaks a surprising amount of frequently used code.

The document doesn't call it dangerous. Nor does it claim that it breaks a surprising amount of code. It simply points out a few popular code bases which it claims to be known to rely on this undefined behaviour and would break due to the change unless the workaround option is used.

Why would this new assumption break practical C++ code?

If practical c++ code relies on undefined behaviour, then changes to that undefined behaviour can break it. This is why UB is to be avoided, even when a program relying on it appears to work as intended.

Are there particular patterns where careless or uninformed programmers rely on this particular undefined behavior?

I don't know if it's wide spread anti-pattern, but an uninformed programmer might think that they can fix their program from crashing by doing:

if (this)
    member_variable = 42;

When the actual bug is dereferencing a null pointer somewhere else.

I'm sure that if programmer is uninformed enough, they will be able to come up with more advanced (anti)-patterns that rely on this UB.

I cannot imagine anyone writing if (this == NULL) because that is so unnatural.

I can.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 11
    "If practical c++ code relies on undefined behaviour, then changes to that undefined behaviour can break it. This is why UB is to be avoided" this * 1000 – underscore_d Apr 28 '16 at 11:06
  • `if(this == null) PrintSomeHelpfulDebugInformationAboutHowWeGotHere();` Such as a nice easy-to-read log of a sequence of events that a debugger can't easily tell you about. Have fun debugging this now without spending hours placing checks everywhere when there's a sudden random null in a large data set, in code you haven't written... And the UB rule about this was made later, after C++ was created. It used to be valid. – Stephane Hockenhull May 09 '18 at 03:40
  • @StephaneHockenhull That's what `-fsanitize=null` is for. – eerorika May 09 '18 at 08:24
  • @user2079303 Issues: Is that going to slow down production code to the point where you can't leave the check in while running, costing the company a lot of money? Is that going to increase size and not fit in flash? Does that work on all target platforms including Atmel? Can `-fsanitize=null` log the errors to the SD/MMC card on Pins #5,6,10,11 using SPI? That's not a universal solution. Some have argued that it goes against object-oriented principles to access a null object yet some OOP languages have a null object that can be operated on so it's not a universal rule of OOP. 1/2 – Stephane Hockenhull May 09 '18 at 22:19
  • It also seems at odds with C++ initial design of being an OOP-facilitating version of C. And with the recent relaxing of the POD rules, since POD structs/class member functions are simply OOP-convenience-mapping to a simple function with a hidden `this` parameter there should be nothing unnatural about being able to call a function with a null this without having to revert to C-style explicitly argument passing a `me` pointer. I don't see a reason to make POD-class null calls strictly UB. Technically it _could_ work just fine and would save a lot of work. (eg: C# now has ?. for this reason) – Stephane Hockenhull May 09 '18 at 22:29
  • I'm not arguing against your answer. Just providing more examples of (anti)-patterns that would be allowed if null calls were not UB strictly for POD objects (and non-virtual functions). There are a lot of ways to make code smaller, more convenient, and more readable. Complex data queries with null returns if there are no matches anywhere along the way comes to mind. – Stephane Hockenhull May 09 '18 at 22:41
  • @StephaneHockenhull: One thing both C and C++ need is to recognize a category of compilers where the behavior of operations on PODS is defined in terms of actions on bits and bytes, but compilers are expressly allowed perform *specific* kinds of optimizations even if they would alter program behavior. Defining things in that fashion would allow for much simpler rules than are possible with the present design of the Standard. To use a rough analogy, which is easier: describing the set of all files that do not contain the characters "bob" except as part of the word "bobbin", or do produce... – supercat Jun 20 '18 at 16:40
  • 1
    ...a regular expression which matches such files? Saying that e.g. if an lvalue is accessed twice, a compiler may consolidate the accesses *unless code between them does any of several specific things* would be much easier than trying to define the precise situations in which code is allowed to access storage. – supercat Jun 20 '18 at 16:43
25

Some of the "practical" (funny way to spell "buggy") code that was broken looked like this:

void foo(X* p) {
  p->bar()->baz();
}

and it forgot to account for the fact that p->bar() sometimes returns a null pointer, which means that dereferencing it to call baz() is undefined.

Not all the code that was broken contained explicit if (this == nullptr) or if (!p) return; checks. Some cases were simply functions that didn't access any member variables, and so appeared to work OK. For example:

struct DummyImpl {
  bool valid() const { return false; }
  int m_data;
};
struct RealImpl {
  bool valid() const { return m_valid; }
  bool m_valid;
  int m_data;
};

template<typename T>
void do_something_else(T* p) {
  if (p) {
    use(p->m_data);
  }
}

template<typename T>
void func(T* p) {
  if (p->valid())
    do_something(p);
  else 
    do_something_else(p);
}

In this code when you call func<DummyImpl*>(DummyImpl*) with a null pointer there is a "conceptual" dereference of the pointer to call p->DummyImpl::valid(), but in fact that member function just returns false without accessing *this. That return false can be inlined and so in practice the pointer doesn't need to be accessed at all. So with some compilers it appears to work OK: there's no segfault for dereferencing null, p->valid() is false, so the code calls do_something_else(p), which checks for null pointers, and so does nothing. No crash or unexpected behaviour is observed.

With GCC 6 you still get the call to p->valid(), but the compiler now infers from that expression that p must be non-null (otherwise p->valid() would be undefined behaviour) and makes a note of that information. That inferred information is used by the optimizer so that if the call to do_something_else(p) gets inlined, the if (p) check is now considered redundant, because the compiler remembers that it is not null, and so inlines the code to:

template<typename T>
void func(T* p) {
  if (p->valid())
    do_something(p);
  else {
    // inlined body of do_something_else(p) with value propagation
    // optimization performed to remove null check.
    use(p->m_data);
  }
}

This now really does dereference a null pointer, and so code that previously appeared to work stops working.

In this example the bug is in func, which should have checked for null first (or the callers should never have called it with null):

template<typename T>
void func(T* p) {
  if (p && p->valid())
    do_something(p);
  else 
    do_something_else(p);
}

An important point to remember is that most optimizations like this are not a case of the compiler saying "ah, the programmer tested this pointer against null, I will remove it just to be annoying". What happens is that various run-of-the-mill optimizations like inlining and value range propagation combine to make those checks redundant, because they come after an earlier check, or a dereference. If the compiler knows that a pointer is non-null at point A in a function, and the pointer isn't changed before a later point B in the same function, then it knows it is also non-null at B. When inlining happens points A and B might actually be pieces of code that were originally in separate functions, but are now combined into one piece of code, and the compiler is able to apply its knowledge that the pointer is non-null in more places. This is a basic, but very important optimization, and if compilers didn't do that everyday code would be considerably slower and people would complain about unnecessary branches to re-test the same conditions repeatedly.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • Is it possible to instrument GCC 6 to output compile-time warnings when it encounters such usages of `this`? – jotik Apr 28 '16 at 21:53
  • 5
    @jotik Something like ["warning: after 3 levels of inlining (potentially across files with Link Time Optimization), some common subexpression elimination, after hoisting this thing out of a loop and proving that these 13 pointers don't alias, we found a case where you are \[comparing `this` to null\]"](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html)? – T.C. Apr 28 '16 at 22:23
  • 4
    @jotik, ^^^ what T.C said. It would be possible, but you would get that warning **FOR ALL CODE, ALL THE TIME**. Value range propagation is one of the most common optimizations, that affects almost all code, everywhere. The optimizers just see code, which can be simplified. They don't see "a piece of code written by an idiot that wants to be warned if their dumb UB gets optimized away". It's not easy for the compiler to tell the difference between "redundant check that the programmer wants to be optimized" and "redundant check that the programmer thinks will help, but is redundant". – Jonathan Wakely Apr 28 '16 at 23:02
  • 1
    If you want to instrument your code to give **runtime** errors for various types of UB, including invalid uses of `this`, then just use `-fsanitize=undefined` – Jonathan Wakely Apr 28 '16 at 23:07
  • @T.C. I understand why what you described is a bad idea, and its not why I asked. I meant direct cases where the programmer has written questionable things like `if(!this)` or `(this == 0)` etc. Is it possible to make the compiler (GCC 6) warn when it encounters such code? This might help [efforts](https://bugs.gentoo.org/show_bug.cgi?id=581434) to fix buggy code. – jotik Apr 29 '16 at 08:37
  • 3
    @jotik [There's already a warning for that.](http://stackoverflow.com/questions/36893251/why-does-the-enhanced-gcc-6-optimizer-break-practical-c-code/36916651#comment61383137_36894819) – T.C. Apr 29 '16 at 08:38
  • Ah yes. Also documented on the ["Porting to GCC 6" page](https://gcc.gnu.org/gcc-6/porting_to.html). Thanks! – jotik Apr 29 '16 at 08:51
-24

The C++ standard is broken in important ways. Unfortunately, rather than protect the users from these problems, the GCC developers have chosen to use undefined behaviour as an excuse to implement marginal optimisations, even when it has been clearly explained to them how harmful it is.

Here a much cleverer person than I explains in great detail. (He's talking about C but the situation is the same there).

Why is it harmful?

Simply recompiling previously working, secure code with a newer version of the compiler can introduce security vulnerabilities. While the new behaviour can be disabled with a flag, existing makefiles do not have that flag set, obviously. And since no warning is produced, it is not obvious to the developer that the previously reasonable behaviour has changed.

In this example, the developer has included a check for integer overflow, using assert, which will terminate the program if an invalid length is supplied. The GCC team removed the check on the basis that integer overflow is undefined, therefore the check can be removed. This resulted in real in-the-wild instances of this codebase being re-made vulnerable after the issue had been fixed.

Read the whole thing. It's enough to make you weep.

OK, but what about this one?

Way back when, there was a fairly common idiom which went something like this:

 OPAQUEHANDLE ObjectType::GetHandle(){
    if(this==NULL)return DEFAULTHANDLE;
    return mHandle;

 }

 void DoThing(ObjectType* pObj){
     osfunction(pObj->GetHandle(), "BLAH");
 }

So the idiom is: If pObj is not null, you use the handle it contains, otherwise you use a default handle. This is encapsulated in the GetHandle function.

The trick is that calling a non-virtual function doesn't actually make any use of the this pointer, so there is no access violation.

I still don't get it

A lot of code exists which is written like that. If someone simply recompiles it, without changing a line, every call to DoThing(NULL) is a crashing bug - if you are lucky.

If you are not lucky, calls to crashing bugs become remote execution vulnerabilities.

This can occur even automatically. You've got an automated build system, right? Upgrading it to the latest compiler is harmless, right? But now it's not - not if your compiler is GCC.

OK so tell them!

They've been told. They are doing this in the full knowledge of the consequences.

but... why?

Who can say? Perhaps:

  • They value the ideal purity of the C++ language over actual code
  • They believe people should be punished for not following the standard
  • They have no understanding of the reality of the world
  • They are ... introducing bugs on purpose. Perhaps for a foreign government. Where do you live? All governments are foreign to most of the world, and most are hostile to some of the world.

Or perhaps something else. Who can say?

Ben
  • 34,935
  • 6
  • 74
  • 113
  • 33
    Disagree with every and single line of the answer. Same comments were made for strict aliasing optimizations, and those hopefully are dismissed now. The solution is to educate developers, not prevent optimizations based on bad development habits. – SergeyA Apr 27 '16 at 15:54
  • 32
    I did go and read the whole thing like you said, and indeed I wept, but mainly at the stupidity of Felix which I don't think was what you were trying to get across... – Mike Vine Apr 27 '16 at 16:03
  • 14
    You seem very resolute on your stance concerning the standard being *broken*. May I ask if you have a defect report on the way? – eerorika Apr 27 '16 at 16:03
  • 3
    It is a reasonable thing to argue that the standard is broken regarding this being allowed to be null for non-vcalls (I don't necessarily agree but that's a fine opinion to hold). For other optimizations (e.g. the "Felix-gcc" one) it is very hard to teach a compiler what is OK to optimize and what a human intended. That is architecturally very unclean and might be practically impossible. Linus Torvalds has made the same error once about an optimized out NULL check. – boot4life Apr 27 '16 at 16:16
  • 9
    'pretty much every real-world program is undefined' - not sure what is the basis of this claim. The systems I work with do not rely on undefined behaviour, and when such is discovered (because people make mistakes), it is treated as bug and fixed, rather than blaming it on the compiler. – SergeyA Apr 27 '16 at 16:16
  • 5
    @boot4life, it's integer overflow. It happens. Obviously it is wrong for the compiler to assume that it doesn't. It's that assumption which is wrong, not the fact that the call removed was an assert. It should not be UB at all, it should be defined to either wrap 2s-compliment or trap, and those are the only sensible things for a compiler to do. Assuming it can't happen in 2015 when all know it is a major source of security bugs is not innocent, nor even stupid, it's malice. – Ben Apr 27 '16 at 16:21
  • 4
    @Ben you can turn on that behavior using a switch and I would not condemn you for doing that. I find the idea of safer versions of C and C++ very useful. But it costs your performance in certain loops. C generally is a language that is supposed to be so close to the metal that there is not enough room below C and above assembler for another language. – boot4life Apr 27 '16 at 16:23
  • 4
    @boot4life The problem with a switch is it is old code which needs that behaviour, that is the code where by definition the makefile does not have the switch because it did not exist when the code was written. – Ben Apr 27 '16 at 16:25
  • 4
    @boot4life, actually those who argue against this kind of optimization want semantics which are closer to "the metal". – Carsten S Apr 27 '16 at 16:29
  • 1
    @CarstenS semantics close to the metal but performance *removed* from the metal. Signed overflow allows you to conclude in `for (int i = 0; i < N; i++)` that `i >= 0` (not wrapping) and even `N >= 0`! Apparently, that is useful for certain optimizations. (This is what I read.) – boot4life Apr 27 '16 at 16:38
  • 1
    @boot4life, I am not taking sides here, but I found this paper (also via fefe) interesting in that regard: http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf – Carsten S Apr 27 '16 at 16:42
  • 5
    Anybody who is pitying safety versus performance should just switch to Java. – SergeyA Apr 27 '16 at 16:56
  • 4
    @SergeyA, again, *again*, this is about pre-existing code, not new code. New code should avoid UB. But new compiler versions should not break existing code. – Ben Apr 27 '16 at 17:02
  • 3
    @boot4life, that tight loop which is 30% to 40% faster because it assumes no integer overflow, might be the loop which is copying the exploit over your return address on the stack. – Ben Apr 27 '16 at 17:08
  • 2
    @Ben For pre-existing code, if the user is allowed and able to compile from source, common practice has been to allow the user to specify the compiler and the compiler flags to use. This gives the user the ability to add options to disable specific optimisations without having to change the Makefile or the code. –  Apr 27 '16 at 17:31
  • 33
    Downvoted for the useless rant. "They are ... introducing bugs on purpose. Perhaps for a foreign government." Really? This isn't /r/conspiracy. – isanae Apr 27 '16 at 17:36
  • 2
    @hvd, no it isn't. Common practice is to use the latest compiler for each new release. I'm glad you version-control your compiler and BDIFF the different versions looking for breaking changes in the compiled output before upgrading, but most people just don't. – Ben Apr 27 '16 at 17:45
  • 1
    @isanae, You have to stop thinking these things are accidents. Haven't you been paying attention? Two - that's two - unknown parties subverted JuniperOS. We kind of assume one was the NSA, but who was the other? – Ben Apr 27 '16 at 17:53
  • 3
    @Ben You've read my comment just barely enough to type up a response to it, but clearly not nearly enough to understand what I wrote. You were saying that for pre-existing code, there is no way that their Makefiles specify any options needed to make it work that did not exist at the time. I'm saying that those Makefiles do typically specify that the user can provide custom options. Your response to that totally misses my point. It's not about how many developers / users make use of that ability, it's merely about the ability being there, when you say it's not. –  Apr 27 '16 at 18:08
  • 8
    @hvd, I understand that the user can "specify options needed to make it work", and for an automated build system, you can set these options as a global default. I get it. My point is that by default **there is no reason for them to believe that there is any need for them to do it**. There are no new warnings produced, it just compiles. How can they know that actually there is a breaking change there? They don't read the compiler release notes, **they assume the compiler writer is on their side**. Unfortunately it seems **he is not**. Breaking changes need to be opt-in not opt-out. – Ben Apr 27 '16 at 18:13
  • 17
    Step number one in writing robust correct code is to know your tools. Someone who was so wholly unacquainted with their language as to mess this up is not producing secure code no matter how much help they get from the compiler. The security vulnerabilities were introduced by having incompetent programmers write the code, not by the compiler upgrade. – Ben Voigt Apr 27 '16 at 20:29
  • 1
    @Ben And what do you think that code was doing before? – Random832 Apr 27 '16 at 20:59
  • 32
    Decent programmers over and over again repeat the mantra _do not invoke undefined behaviour_, yet these nonks have gone ahead and done it anyway. And look what happened. I have no sympathy whatsoever. This is the developers' fault, simple as that. They need to take responsibility. Remember that? Personal responsibility? People relying on your mantra "but what about _in practice_!" is precisely how this situation arose in the first place. Avoiding nonsense like this is precisely why standards exist in the first place. Code to standards, and you won't have a problem. Period. – Lightness Races in Orbit Apr 27 '16 at 22:51
  • 14
    _"They don't read the compiler release notes, they assume the compiler writer is on their side"_ And if they work for me, they are subsequently fired for gross incompetence. Seriously, who calls themselves a programmer yet doesn't bother to read release notes when they upgrade their entire toolchain to a new major revision? Defending that behaviour with the ridiculous "the standard is wrong" is just absurd. Is the C++ standard perfect? Absofrickinglutely not. But it's right in this case and, even if it weren't, your rationale is mortifyingly flawed. – Lightness Races in Orbit Apr 27 '16 at 22:55
  • 2
    @MikeVine: I'm amazed nobody stepped in to tell Felix to calm down. What a douche! Nine years on I wonder where he is now... – Lightness Races in Orbit Apr 27 '16 at 23:09
  • 18
    "Simply recompiling previously working, secure code with a newer version of the compiler can introduce security vulnerabilities" - **that always happens**. Unless you want to mandate that one version of one compiler is the only compiler that will be allowed for the rest of eternity. Do you remember back when the linux kernel could only be compiled with exactly gcc 2.7.2.1 ? The gcc project even got forked because people were fed up with bullcrap . It took a long time to get past that. – M.M Apr 28 '16 at 00:15
  • 18
    Also relevant: GCC ***warns*** about this http://coliru.stacked-crooked.com/a/dd3a1326e3e26097. Really, if you recompile with a newer compiler, don't even have warnings enabled, don't have tests, and so fall for this, you don't deserve sympathy. Fix your code, fix your build system, fix your workflow. – R. Martinho Fernandes Apr 28 '16 at 09:18
  • 5
    It goes against the C++ group-think, but this is the *only* correct answer to this question. They *just don't care* about whether they break existing, widely-used, previously-working code, and the C++ community is too broken to recognize that. – Jonathan Cast Apr 28 '16 at 14:33
  • 5
    @jcast: I'd _really_ prefer that the C++ standard wasn't written by google. To prevent that, we have to hold google accountable for writing bad code, rather than shift the responsibility onto compiler vendors and the standards committee to accommodate google's errors. –  Apr 28 '16 at 16:05
  • 3
    @LightnessRacesinOrbit: He's still out there ranting against GCC, specifically about this very issue we are discussing here: http://blog.fefe.de/?ts=a9de792d – mastov Apr 28 '16 at 17:37
  • 3
    @jcast: Were that the way things should be, Chrome and Firefox wouldn't have become wildly popular by sticking rigidly to the standards, putting an end to the torture of the ---hack--- browser wars and leading directly to the web renaissance we enjoy today. – Lightness Races in Orbit Apr 28 '16 at 17:43
  • 6
    Working code? No, relying on null dereference not crashing is not working code, any more than writing code with race conditions "works" until Moore's Law makes it *not* work on a new, faster computer. Or maybe the guys who speed up the chips are also into "purity" instead of sticking with what works. – Andrew Lazarus May 03 '16 at 22:35
  • 1
    @LightnessRacesinOrbit Standard should define behavior. Standard which defines undefined behavior is crap. So, fix standard, compilers, and after what peoples will folliw standard, but not sooner. – Dmitry Azaraev May 04 '16 at 07:04
  • 1
    https://gist.github.com/rygorous/e0f055bfb74e3d5f0af20690759de5a7 This is a nice explanation of the signed overflow issue. Adding this here for all interested parties to see. – boot4life May 09 '16 at 11:40
  • 1
    According to the rationale, the authors of the Standard intended that the marketplace would drive quality implementations to define behaviors beyond those mandated by the Standard in cases where they would be useful and practical. Further, the authors stated that they did not view it necessary to forbid implementations that were of such poor quality as to be useless. I would be surprised if less than 99.9% of C programs invoke Undefined Behavior under a literal reading of 6.5p7, or if less than 90% of programs that use structures with non-character members would do so even if... – supercat Jun 20 '18 at 19:30
  • 1
    ...the phrase "by an lvalue..." were were replaced with "by an lvalue, or an assignment to an lvalue,...". On the flip side, replacing the phrase with "...using an lvalue which is freshly and visibly derived from an lvalue" would define the behavior of many programs that would otherwise require `-fno-strict-aliasing`. The failure of the Standard to define the behavior of programs whose behavior obviously should be defined wouldn't be a particular problem if compiler writers viewed the ability to process such programs as a feature of quality implementations and not "missed optimizations". – supercat Jun 20 '18 at 19:36
  • 2
    @boot4life: Allowing compilers to perform integer computations using longer types than specified when convenient--in a fashion analogous to what implementations can do with `float` and `double` if `_FLT_EVAL_METHOD` is negative--would be sufficient to allow most of the useful optimizations that could be gained by making overflow UB, but avoid the need for programs to guard against overflow in cases where such treatment would yield acceptable behavior. Letting compilers jump completely off the rails offers minimal additional benefit, and forces programmers write inefficient code. – supercat Jun 20 '18 at 22:38