129

Consider the following code:

#include <iostream>

struct foo
{
    // (a):
    void bar() { std::cout << "gman was here" << std::endl; }

    // (b):
    void baz() { x = 5; }

    int x;
};

int main()
{
    foo* f = 0;

    f->bar(); // (a)
    f->baz(); // (b)
}

We expect (b) to crash, because there is no corresponding member x for the null pointer. In practice, (a) doesn't crash because the this pointer is never used.

Because (b) dereferences the this pointer ((*this).x = 5;), and this is null, the program enters undefined behavior, as dereferencing null is always said to be undefined behavior.

Does (a) result in undefined behavior? What about if both functions (and x) are static?

pbn
  • 112
  • 10
GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • If both functions are _static_, how could x be referred inside _baz_? (x is a non-static member variable) – legends2k Sep 29 '10 at 16:09
  • 4
    @legends2k: Pretend `x` was made static too. :) – GManNickG Sep 29 '10 at 21:25
  • 5
    Interesting: Come the next revision of C++, there shall be no more dereferencing of pointers at all. We shall now _perform indirection_ through pointers. To find out more, please perform indirection through this link: [N3362](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3362.html) – James McNellis Jun 05 '12 at 21:08
  • @JamesMcNellis: Whoa! That's quite a change. – GManNickG Jun 05 '12 at 21:11
  • 4
    Invoking a member function on a null pointer is *always* undefined behavior. Just by looking at your code, I can already feel the undefined behavior slowly crawling up my neck! – fredoverflow Jul 27 '12 at 09:02
  • Surely, but for the case (a) it works the same in all cases, i.e, the function gets invoked. However, replacing the value of the pointer from 0 to 1 (say, through reinterpret_cast), it almost invariably always crashes. Does the value allocation of 0 and thus NULL, as in case a, represents something special to the compiler ? Why does it always crash with any other value allocated to it? – Siddharth Shankaran Dec 28 '10 at 12:18

2 Answers2

124

Both (a) and (b) result in undefined behavior. It's always undefined behavior to call a member function through a null pointer. If the function is static, it's technically undefined as well, but there's some dispute.


The first thing to understand is why it's undefined behavior to dereference a null pointer. In C++03, there's actually a bit of ambiguity here.

Although "dereferencing a null pointer results in undefined behavior" is mentioned in notes in both §1.9/4 and §8.3.2/4, it's never explicitly stated. (Notes are non-normative.)

However, one can try to deduced it from §3.10/2:

An lvalue refers to an object or function.

When dereferencing, the result is an lvalue. A null pointer does not refer to an object, therefore when we use the lvalue we have undefined behavior. The problem is that the previous sentence is never stated, so what does it mean to "use" the lvalue? Just even generate it at all, or to use it in the more formal sense of perform lvalue-to-rvalue conversion?

Regardless, it definitely cannot be converted to an rvalue (§4.1/1):

If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

Here it's definitely undefined behavior.

The ambiguity comes from whether or not it's undefined behavior to deference but not use the value from an invalid pointer (that is, get an lvalue but not convert it to an rvalue). If not, then int *i = 0; *i; &(*i); is well-defined. This is an active issue.

So we have a strict "dereference a null pointer, get undefined behavior" view and a weak "use a dereferenced null pointer, get undefined behavior" view.

Now we consider the question.


Yes, (a) results in undefined behavior. In fact, if this is null then regardless of the contents of the function the result is undefined.

This follows from §5.2.5/3:

If E1 has the type “pointer to class X,” then the expression E1->E2 is converted to the equivalent form (*(E1)).E2;

*(E1) will result in undefined behavior with a strict interpretation, and .E2 converts it to an rvalue, making it undefined behavior for the weak interpretation.

It also follows that it's undefined behavior directly from (§9.3.1/1):

If a nonstatic member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.


With static functions, the strict versus weak interpretation makes the difference. Strictly speaking, it is undefined:

A static member may be referred to using the class member access syntax, in which case the object-expression is evaluated.

That is, it's evaluated just as if it were non-static and we once again dereference a null pointer with (*(E1)).E2.

However, because E1 is not used in a static member-function call, if we use the weak interpretation the call is well-defined. *(E1) results in an lvalue, the static function is resolved, *(E1) is discarded, and the function is called. There is no lvalue-to-rvalue conversion, so there's no undefined behavior.

In C++0x, as of n3126, the ambiguity remains. For now, be safe: use the strict interpretation.

GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • 5
    +1. Continuing the pedantry, under the "weak definition" the nonstatic member function hasn't been called "for an object that is not of type X". It has been called for an lvalue which is not an object at all. So the proposed solution adds the text "or if the lvalue is an empty lvalue" to the clause you quote. – Steve Jessop Mar 18 '10 at 23:28
  • Could you clarify a little? In particular, with your "closed issue" and "active issue" links, what are the issue numbers? Also, if this is a closed issue, what exactly is the yes/no answer for static functions? I feel like I'm missing the final step in trying to understand your answer. – Brooks Moses Mar 18 '10 at 23:30
  • @Brooks: I edited in the number for the closed-issue, the active issue has the number too. (Give the page some time to load so your browser can jump there, the page is enormous). The conclusion was that it's okay, because the weaker interpretation is the "accepted" interpretation. (Which I will add.) I place accepted in quotes since it's still not closed. – GManNickG Mar 18 '10 at 23:33
  • @GMan: Very interesting. I still need to think about this some more, but thanks for not letting this question get lost. I was disappointed to see the thread disappear the other day. –  Mar 20 '10 at 02:53
  • 4
    I don't think CWG defect 315 is as "closed" as its presence on the "closed issues" page implies. The rationale says that it should be allowed because "`*p` is not an error when `p` is null unless the lvalue is converted to an rvalue." However, that relies on the concept of an "empty lvalue," which is part of the proposed resolution to [CWG defect 232](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232), but which has not been adopted. So, with the language in both C++03 and C++0x, dereferencing the null pointer is still undefined, even if there is no lvalue-to-rvalue conversion. – James McNellis Jun 08 '10 at 03:42
  • 1
    @JamesMcNellis: By my understanding, if `p` were a hardware address which would trigger some action when read, but were not declared `volatile`, the statement `*p;` would not be required, *but would be allowed*, to actually read that address; the statement `&(*p);`, however, would be forbidden from doing so. If `*p` were `volatile`, the read would be required. In either case, if the pointer is invalid, I can't see how the first statement wouldn't be Undefined Behavior, but I also can't see why the second statement would be. – supercat Jan 22 '14 at 00:20
  • @GManNickG *"`.E2` converts [`*(E1)` in `(*(E1)).E2`] to an rvalue"*. Why is the object expression converted to an rvalue? I wasn't able to find the rule. – eerorika Jan 03 '18 at 02:39
  • 1
    ".E2 converts it to an rvalue, " - Uh, no it doesn't – M.M Jan 03 '18 at 02:44
35

Obviously undefined means it's not defined, but sometimes it can be predictable. The information I'm about to provide should never be relied on for working code since it certainly isn't guaranteed, but it might come in useful when debugging.

You might think that calling a function on an object pointer will dereference the pointer and cause UB. In practice if the function isn't virtual, the compiler will have converted it to a plain function call passing the pointer as the first parameter this, bypassing the dereference and creating a time bomb for the called member function. If the member function doesn't reference any member variables or virtual functions, it might actually succeed without error. Remember that succeeding falls within the universe of "undefined"!

Microsoft's MFC function GetSafeHwnd actually relies on this behavior. I don't know what they were smoking.

If you're calling a virtual function, the pointer must be dereferenced to get to the vtable, and for sure you're going to get UB (probably a crash but remember that there are no guarantees).

Evg
  • 25,259
  • 5
  • 41
  • 83
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 2
    GetSafeHwnd first does a !this check and if true, returns NULL. Then it begins a SEH frame and dereferences the pointer. if there is Memory Access Violation (0xc0000005) this is caught and NULL is returned to caller :) Else the HWND is returned. – Петър Петров Oct 08 '14 at 02:18
  • 2
    @ПетърПетров it's been quite a few years since I looked at the code for `GetSafeHwnd`, it's possible that they've enhanced it since then. And don't forget that they have insider knowledge on the compiler workings! – Mark Ransom Oct 08 '14 at 02:44
  • I am stating a sample possible implementation that have the same effect, what does it really do is to be reverse-engineered using a debugger :) – Петър Петров Oct 08 '14 at 16:40
  • 1
    "they have insider knowledge on the compiler workings!" - the cause of eternal trouble for projects like MinGW that attempt to allow g++ to compile code that calls the Windows API – M.M Dec 10 '15 at 11:15
  • @M.M I think we would all agree this is unfair. And because of that, I also think there is a law about compatibility that makes it a tiny bit illegal to keep it so. – v.oddou Dec 15 '17 at 05:37
  • Please see https://stackoverflow.com/questions/47026061/whats-the-difference-between-how-virtual-and-non-virtual-member-functions-are-c – ADG Sep 16 '18 at 11:11
  • In particular the reference to the C++ FAQ http://www.cs.technion.ac.il/users/yechiel/c++-faq/dyn-binding.html, it states: "the compiler resolves non-virtual functions exclusively at compile-time based on the type of the pointer" – ADG Sep 16 '18 at 11:22
  • @ADG Does the standard require non-virtual member functions to be implemented as free functions though? I thought the implementation was at the discretion of the compiler (ie an implementation detail, rely on it at your peril). – AnOccasionalCashew Dec 10 '20 at 03:15
  • @MarkRansom I'm sure you're aware of this, but I don't see it mentioned. If the compiler can deduce that undefined behavior is guaranteed to occur then it is permitted to assume that the branch is never taken and optimize it away. So if you `p->f()` and the compiler manages to deduce that `p` is guaranteed to be null then the behavior of the resulting binary might be _much_ less predictable than expected. Checking godbolt, current GCC and Clang versions don't optimize `p->f()` away but do replace branches containing `S s = *p` with an unconditional termination. – AnOccasionalCashew Dec 10 '20 at 04:13
  • 1
    @AnOccasionalCashew yes I'm aware, and it was irresponsible on my part to not bring it up. My favorite post on the subject is [Undefined behavior can result in time travel (among other things, but time travel is the funkiest)](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633). – Mark Ransom Dec 10 '20 at 04:22
  • @M.M up until C++20, it wasn't possible to write `std::vector` in portable C++, it *had* to rely on insider knowledge to conform to all of it's requirements. AFAICT, similarly `std::map` since C++17 has to rely on insider knowledge to be conformant – Caleth Nov 26 '21 at 15:56