28

Recently tried the following program and it compiles, runs fine and produces expected output instead of any runtime error.

#include <iostream>
class demo
{
    public:
        static void fun()
        {
            std::cout<<"fun() is called\n";
        }
        static int a;
};
int demo::a=9;
int main()
{
    demo* d=nullptr;
    d->fun();
    std::cout<<d->a;
    return 0;
}

If an uninitialized pointer is used to access class and/or struct members behaviour is undefined, but why it is allowed to access static members using null pointers also. Is there any harm in my program?

Columbo
  • 60,038
  • 8
  • 155
  • 203
Destructor
  • 14,123
  • 11
  • 61
  • 126
  • 5
    `Is there any harm in my program?` It is still UB. – user1810087 Feb 12 '15 at 16:38
  • 12
    Undefined behavior does not mean that the code is required to crash; rather it means that anything at all is allowed to happen, the result is undefined. That is, the code could appear to work fine and as expected, it could crash, it could appear to run fine but give you the wrong result, anything at all. – wolfPack88 Feb 12 '15 at 16:38
  • 4
    Voted to reopen; the linked question addresses non-static members, not static ones. – T.C. Feb 12 '15 at 16:41
  • @T.C.: The answer's the same, though, isn't it? – wolfPack88 Feb 12 '15 at 16:42
  • @T.C. Alright, this one then? http://stackoverflow.com/questions/3498444/c-static-const-access-through-a-null-pointer – Barry Feb 12 '15 at 16:43
  • 2
    this is an interesting question - it compiles cleanly with no warnings, it calls the correct function. Is it valid syntax? Forget that d is null, what if d is valid pointer. It is surprising to see d->f() where f is a static function – pm100 Feb 12 '15 at 16:46
  • @Barry I'm not quite convinced. After the usual transformation for `->`, the object expression is `*d`, but for static members it's just evaluated and discarded. This is basically http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232; also http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html#315 – T.C. Feb 12 '15 at 16:48
  • http://stackoverflow.com/questions/325555/c-static-member-method-call-on-class-instance - good answer with quote from standard – pm100 Feb 12 '15 at 16:49
  • @pm100: Yes, it's a good Q&A. Unfortunately, a different one though. – Deduplicator Feb 12 '15 at 16:55
  • @Deduplicator well it clarifies that the syntax is valid and quotes the spec. The spec does not say that the object pointer must be valid. Does that mean its UB, ie if the spec is silent is that UB – pm100 Feb 12 '15 at 16:58
  • 3
    The biggest problem is maintainability. It should be demo::f() and demo::a, and if someone later edits the code they might actually try to use that pointer. – Kenny Ostrom Feb 12 '15 at 17:28
  • 2
    ["Somebody told me that in basketball you can't hold the ball and run. I got a basketball and tried it and it worked just fine. He obviously didn't understand basketball."](http://c-faq.com/ansi/experiment.html) – Jonathan Wakely Feb 13 '15 at 18:24

5 Answers5

32

TL;DR: Your example is well-defined. Merely dereferencing a null pointer is not invoking UB.

There is a lot of debate over this topic, which basically boils down to whether indirection through a null pointer is itself UB.
The only questionable thing that happens in your example is the evaluation of the object expression. In particular, d->a is equivalent to (*d).a according to [expr.ref]/2:

The expression E1->E2 is converted to the equivalent form (*(E1)).E2; the remainder of 5.2.5 will address only the first option (dot).

*d is just evaluated:

The postfix expression before the dot or arrow is evaluated;65 the result of that evaluation, together with the id-expression, determines the result of the entire postfix expression.

65) If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.

Let's extract the critical part of the code. Consider the expression statement

*d;

In this statement, *d is a discarded value expression according to [stmt.expr]. So *d is solely evaluated1, just as in d->a.
Hence if *d; is valid, or in other words the evaluation of the expression *d, so is your example.

Does indirection through null pointers inherently result in undefined behavior?

There is the open CWG issue #232, created over fifteen years ago, which concerns this exact question. A very important argument is raised. The report starts with

At least a couple of places in the IS state that indirection through a null pointer produces undefined behavior: 1.9 [intro.execution] paragraph 4 gives "dereferencing the null pointer" as an example of undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses this supposedly undefined behavior as justification for the nonexistence of "null references."

Note that the example mentioned was changed to cover modifications of const objects instead, and the note in [dcl.ref] - while still existing - is not normative. The normative passage was removed to avoid commitment.

However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary "*" operator, does not say that the behavior is undefined if the operand is a null pointer, as one might expect. Furthermore, at least one passage gives dereferencing a null pointer well-defined behavior: 5.2.8 [expr.typeid] paragraph 2 says

If the lvalue expression is obtained by applying the unary * operator to a pointer and the pointer is a null pointer value (4.10 [conv.ptr]), the typeid expression throws the bad_typeid exception (18.7.3 [bad.typeid]).

This is inconsistent and should be cleaned up.

The last point is especially important. The quote in [expr.typeid] still exists and appertains to glvalues of polymorphic class type, which is the case in the following example:

int main() try {

    // Polymorphic type
    class A
    {
        virtual ~A(){}
    };

    typeid( *((A*)0) );

}
catch (std::bad_typeid)
{
    std::cerr << "bad_exception\n";
}

The behavior of this program is well-defined (an exception will be thrown and catched), and the expression *((A*)0) is evaluated as it isn't part of an unevaluated operand. Now if indirection through null pointers induced UB, then the expression written as

*((A*)0);

would be doing just that, inducing UB, which seems nonsensical when compared to the typeid scenario. If the above expression is merely evaluated as every discarded-value expression is1, where is the crucial difference that makes the evaluation in the second snippet UB? There is no existing implementation that analyzes the typeid-operand, finds the innermost, corresponding dereference and surrounds its operand with a check - there would be a performance loss, too.

A note in that issue then ends the short discussion with:

We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior.

I.e. the committee agreed upon this. Although the proposed resolution of this report, which introduced so-called "empty lvalues", was never adopted…

However, “not modifiable” is a compile-time concept, while in fact this deals with runtime values and thus should produce undefined behavior instead. Also, there are other contexts in which lvalues can occur, such as the left operand of . or .*, which should also be restricted. Additional drafting is required.

that does not affect the rationale. Then again, it should be noted that this issue even precedes C++03, which makes it less convincing while we approach C++17.


CWG-issue #315 seems to cover your case as well:

Another instance to consider is that of invoking a member function from a null pointer:

  struct A { void f () { } };
  int main ()
  {
    A* ap = 0;
    ap->f ();
  }

[…]

Rationale (October 2003):

We agreed the example should be allowed. p->f() is rewritten as (*p).f() according to 5.2.5 [expr.ref]. *p is not an error when p is null unless the lvalue is converted to an rvalue (4.1 [conv.lval]), which it isn't here.

According to this rationale, indirection through a null pointer per se does not invoke UB without further lvalue-to-rvalue conversions (=accesses to stored value), reference bindings, value computations or the like. (Nota bene: Calling a non-static member function with a null pointer should invoke UB, albeit merely hazily disallowed by [class.mfct.non-static]/2. The rationale is outdated in this respect.)

I.e. a mere evaluation of *d does not suffice to invoke UB. The identity of the object is not required, and neither is its previously stored value. On the other hand, e.g.

*p = 123;

is undefined since there is a value computation of the left operand, [expr.ass]/1:

In all cases, the assignment is sequenced after the value computation of the right and left operands

Because the left operand is expected to be a glvalue, the identity of the object referred to by that glvalue must be determined as mentioned by the definition of evaluation of an expression in [intro.execution]/12, which is impossible (and thus leads to UB).


1 [expr]/11:

In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression. The expression is evaluated and its value is discarded. […]. The lvalue-to-rvalue conversion (4.1) is applied if and only if the expression is a glvalue of volatile-qualified type and […]

Columbo
  • 60,038
  • 8
  • 155
  • 203
  • 4
    It does not look as if that resolution ever made it into any standard. Probably because of what it would mean for references... – Deduplicator Feb 12 '15 at 17:12
  • @Deduplicator Yeah, there were issues (no pun intended) with the resolutions. However, the conclusion drawn remains. – Columbo Feb 12 '15 at 17:15
  • 1
    If you are aware that the resolution was never made law, then why quote it as if it was? Better find something which did, especially as that one is 13+ years old (two minor and one major standard in the interim). – Deduplicator Feb 12 '15 at 17:18
  • @Deduplicator I didn't quote any resolutions. I quoted rationals. – Columbo Feb 12 '15 at 17:22
  • @T.C.: Yes, an additional pointer to there being a huge construction site. No resolution for the issue at hand though, expecially none which made it into the standard ;-( – Deduplicator Feb 12 '15 at 17:23
  • @Deduplicator ... but I mentioned reference bindings, even in the same sentence. – Columbo Feb 12 '15 at 17:42
  • 2
    "Dereferencing a null pointer doesn't invoke UB without further lvalue-to-rvalue conversions (=accesses to stored value) or reference bindings" that's what some want, not what actually is. Happy with my extending the quote? – Deduplicator Feb 12 '15 at 17:46
  • Reference binding is an odd case because there the standard definitely doesn't mean what it says. What it currently has ("A reference shall be initialized to refer to a valid object or function.") would require a diagnostic every time that rule is violated, which is obviously impractical. – T.C. Feb 12 '15 at 17:47
  • @Columbo: As mentioned in my answer http://stackoverflow.com/a/28483477/6345, your CWG reference doesn't dispute that the above is UB. – Johann Gerell Feb 12 '15 at 19:12
  • 1
    "`*p` is not an error when p is null unless the lvalue is converted to an rvalue" - so `int *p = NULL; *p = 5;` is not an error? – user253751 Feb 12 '15 at 20:32
  • @immibis The "unless" bit is not the only exception, I'm afraid, and this particular example was mentioned in chat. I added an explanation further down the answer which tries to settle this case. – Columbo Feb 12 '15 at 20:49
  • *"The identity of the object is not required"* In the example of the OP or in the example of CWG 315? – dyp Feb 12 '15 at 21:17
  • @dyp In the example of the OP. But why would the identity be inherently required for a call to a member function? IMO an identity is necessary once e.g. a data member is accessed. – Columbo Feb 12 '15 at 21:30
  • I do not know what exactly "identity" means in this context. However, I can imagine you need an address for the this-pointer parameter of the member function (from an implementation's point of view). After all, that member function could be defined in another TU. Consider `f().g()` -- we need to evaluate `f()` to get the identity of the object the object-expression refers to. Not that any of this would matter, since we know there's an unresolved defect. – dyp Feb 12 '15 at 22:31
  • Well, @Deduplicator, you have a point. I adjusted my answer. Do you think it's appropriate now? – Columbo Feb 13 '15 at 18:18
  • The penultimate sentence of the discussion of CWG 232 is *"Also, there are other contexts in which lvalues can occur, such as the left operand of `.` or `.*`, which should also be restricted."*, whereas the rationale of CWG 315 states that this example should be well-formed. So either those two are contradicting, or there "is" a set of specific restriction that does not contain the example from CWG 315. It *seems* the committee thinks this issue has not been resolved (or they didn't update the status). I'm not sure what the implementers think (especially wrt UB optimizations). – dyp Feb 13 '15 at 18:47
  • @dyp So how about a "push"? E.g. via a thread in the discussion group? – Columbo Feb 13 '15 at 18:53
  • Of course you can try that :) But I would guess that the committee rather favours discussions of actual problems. So, it would be nice if there was some convincing example that requires a resolution of this issue. (For example, [Richard Smith's lambda trick](http://llvm.org/bugs/show_bug.cgi?id=20209), but it looks so much like a hack to me that it might not be convincing.) Edit: A lot of UB has become observable via constant expressions, and I think some of the UB optimizations have also been introduced since the last discussions of that topic. – dyp Feb 13 '15 at 19:20
  • 1
    "The behavior of this program is well-defined " yes, but not due to any general rule, but due to the specific and narrow exception for `typeid` you quoted. – Deduplicator Feb 13 '15 at 20:43
  • @Deduplicator And that's exactly my argument. Why is it valid to perform indirection through a null pointer only if that expression is an operand of typeid? Note that both are evaluated, but I do mention that in my answer – Columbo Feb 13 '15 at 20:46
  • 1
    Well, because the definition of typeid makes extra-provisions for exactly that case. Which means it is (unfortunately?) not generalizable. Anyway, I like it much better now. – Deduplicator Feb 13 '15 at 20:49
  • @Deduplicator I think you missed the point I was trying to make. I am not using this to make conclusions about the general case. I am trying to point out inconsistencies just as Miller did. Btw., I'll mention this another time now: The operand is evaluated. If the evaluation does not invoke UB there, it cannot on its own, do you agree? – Columbo Feb 13 '15 at 20:50
  • 1
    Yes, having an exception for `typeid` is certainly inconsistent. It just wasn't quite clear to me how you were arguing it there. Well, good job. – Deduplicator Feb 13 '15 at 20:59
  • "Even calling a member function with *d as the object argument doesn't require" - mean to complete that sentence? Also, in the case of non-static member functions, [class.mfct.non-static]/p2 is relevant. – T.C. Feb 19 '15 at 10:40
  • @T.C. [class.mfct.non-static]/p2 solely addresses object arguments that refer to some object. No object is associated with an 'empty lvalue'. Or do you think that passage still carries some weight in this discussion? – Columbo Feb 19 '15 at 13:49
  • @Columbo Well, it makes no sense to me that calling `f()` on a non-object isn't UB, but is UB on some object of the wrong type. – T.C. Feb 19 '15 at 15:08
  • @T.C. It doesn't make sense to me that the object arguments value is of any importance if `this` is not odr-used. As long as the object argument itself is a valid expression, and not used inside the member function, I see no problem whatsoever. – Columbo Feb 19 '15 at 16:36
  • @Columbo I think it's reasonable to permit implementations to diagnose calling a member function on a non-object (or object of the wrong type), even if `this` isn't odr-used, since that usually indicate some sort of programmer error. – T.C. Feb 19 '15 at 16:40
  • @T.C. Well, an Implementation is allowed to diagnose anything with a warning. Or are you talking about constant expressions, where such calls should be disallowed by making them UB? – Columbo Feb 19 '15 at 16:47
  • 1
    @Columbo It's technically impossible to diagnose such calls at compile time; making them UB allows them to be caught at run time using sanitizers. Not being able to use them in constant expressions is a nice bonus. – T.C. Feb 19 '15 at 16:50
  • @T.C. Runtime sanitizers? Never heard of that. Good point though. – Columbo Feb 19 '15 at 16:59
  • 3
    @Columbo Clang and GCC has a whole set of them. https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html, search for `-fsanitize`. – T.C. Feb 19 '15 at 17:02
  • @T.C. Whoops, totally forgot to answer, I apologize. Sanitizers are not standardized and thus allowed to diagnose anything, even well-defined statements (as long as they are inherently unreasonable - i.e. "insane"). Also, I agree that such an interpretation of [class.mfct.non-static]/p2 is inconsistent/nonsensical. However, I believe that paragraph should be removed, since aliasing is covered by §3.10 anyway, and there is no reason to restrict possible values of `this`. – Columbo May 07 '15 at 10:32
  • 2
    @Columbo A sanitizer that turns well-defined code into errors is going to be pretty annoying IMO. Regardless, I see no reason why the standard shouldn't restrict it, if permitting it only allows "insane" code. – T.C. May 07 '15 at 16:30
  • @T.C.: If everyone who thinks a construct has, or should have, defined behavior agrees what that behavior should be, that would suggest to me that the behavior should be defined unless doing so would impose non-trivial expense or create an actual problem. The fact that some people may not see a use for it, or think the construct is meaningless, insane, distasteful, or whatever, is hardly a good reason to deny the construct to those who would find it useful. – supercat Jul 18 '18 at 18:23
4

From the C++ Draft Standard N3337:

9.4 Static members

2 A static member s of class X may be referred to using the qualified-id expression X::s; it is not necessary to use the class member access syntax (5.2.5) to refer to a static member. A static member may be referred to using the class member access syntax, in which case the object expression is evaluated.

And in the section about object expression...

5.2.5 Class member access

4 If E2 is declared to have type “reference to T,” then E1.E2 is an lvalue; the type of E1.E2 is T. Otherwise, one of the following rules applies.

— If E2 is a static data member and the type of E2 is T, then E1.E2 is an lvalue; the expression designates the named member of the class. The type of E1.E2 is T.

Based on the last paragraph of the standard, the expressions:

  d->fun();
  std::cout << d->a;

work because they both designate the named member of the class regardless of the value of d.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • Where in those quotes does it allow `d` not to point to any object? – Deduplicator Feb 12 '15 at 17:13
  • 1
    "If E2 is a static data member ... the expression designates the named member." It says it right there, and it makes sense because the pointer is completely irrelevant. – Kenny Ostrom Feb 12 '15 at 17:19
  • 1
    @KennyOstrom: I don't see any mandate to ignore UB invoked by the expression `E1` in that quote, sorry. – Deduplicator Feb 12 '15 at 17:20
  • @Deduplicator Where in the standard does it actually say dereferencing a null pointer, by itself, *is* undefined behavior? – T.C. Feb 12 '15 at 17:23
  • @Deduplicator, My understanding: `d->fun()` is converted to `(*d)->fun()`. However, that is only at compile time. The expression `(*d)` is never evaluated at run time since `d->fun()` is already resolved to `demo::fun()` at compile time. – R Sahu Feb 12 '15 at 17:24
  • 1
    @RSahu: If that was the case, they would have designated it as an "unevaluated context", which they explicitly did not. – Deduplicator Feb 12 '15 at 17:25
  • @Deduplicator, could that be a defect of omission? – R Sahu Feb 12 '15 at 17:26
  • 1
    @RSahu: Making it unevaluated would open a humongous can of worms, because making a function static or not would drastically and unexpectedly change code, especially in templates. And allowing "empty lvalues" is something they didn't do because that would also have far-reaching unfortunate consequences. – Deduplicator Feb 12 '15 at 17:29
  • 4
    @RSahu The LHS of `.` needs to be evaluated even if it's static, or `g().f()` might not evaluate `g()`. – T.C. Feb 12 '15 at 17:42
  • 2
    Even though the expression designates the named member of the class, that doesn't "bypass" the evaluation of `E1` . A more stark example, `g()->a` where `g` divides by zero for example, and then returns a null pointer – M.M Nov 17 '19 at 22:45
4

runs fine and produces expected output instead of any runtime error.

That's a basic assumption error. What you are doing is undefined behavior, which means that your claim for any kind of "expected output" is faulty.

Addendum: Note that, while there is a CWG defect (#315) report that is closed as "in agreement" of not making the above UB, it relies on the positive closing of another CWG defect (#232) that is still active, and hence none of it is added to the standard.

Let me quote a part of a comment from James McNellis to an answer to a similar Stack Overflow question:

I don't think CWG defect 315 is as "closed" as its presence on the "closed issues" page implies. The rationale says that it should be allowed because "*p is not an error when p is null unless the lvalue is converted to an rvalue." However, that relies on the concept of an "empty lvalue," which is part of the proposed resolution to CWG defect 232, but which has not been adopted.

Community
  • 1
  • 1
Johann Gerell
  • 24,991
  • 10
  • 72
  • 122
  • 3
    @Columbo: Now if you would show that those resolutions ever made it into the standard, you would have a point. – Deduplicator Feb 12 '15 at 17:15
  • @Columbo: I added a blurb addendum from James McNellis that clarifies why your answer doesn't dispute that it's UB. – Johann Gerell Feb 12 '15 at 19:11
  • Now this is the only correct answer. Shame I cannot upvote it again. – Deduplicator Feb 13 '15 at 00:30
  • Is there any reason why a *quality* implementation should ever care about the instance in this scenario? I suppose it might be fair to warn that code might get broken by compiler designers which take pride in finding "clever" ways to avoid making their compilers do anything not mandated by the Standard. On the other hand, the Standard only defines a "conforming" implementation, rather than an "implementation whose usefulness isn't undermined by obtuseness"; the fact that an some particular behavior would be allowable by the former doesn't mean the latter could behave likewise. – supercat Jul 14 '18 at 16:53
1

The expressions d->fun and d->a() both cause evaluation of *d ([expr.ref]/2).

The complete definition of the unary * operator from [expr.unary.op]/1 is:

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.

For the expression d there is no "object or function to which the expression points" . Therefore this paragraph does not define the behaviour of *d.

Hence the code is undefined by omission, since the behaviour of evaluating *d is not defined anywhere in the Standard.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    @HolyBlackCat That is correct but my answer depends on the text "the object or function to which the expression points" – M.M Nov 17 '19 at 22:45
0

What you are seeing here is what I would consider an ill-conceived and unfortunate design choice in the specification of the C++ language and many other languages that belong to the same general family of programming languages.

These languages allow you to refer to static members of a class using a reference to an instance of the class. The actual value of the instance reference is of course ignored, since no instance is required to access static members.

So, in d->fun(); the the compiler uses the d pointer only during compilation to figure out that you are referring to a member of the demo class, and then it ignores it. No code is emitted by the compiler to dereference the pointer, so the fact that it is going to be NULL during runtime does not matter.

So, what you see happening is in perfect accordance to the specification of the language, and in my opinion the specification suffers in this respect, because it allows an illogical thing to happen: to use an instance reference to refer to a static member.

P.S. Most compilers in most languages are actually capable of issuing warnings for that kind of stuff. I do not know about your compiler, but you might want to check, because the fact that you received no warning for doing what you did might mean that you do not have enough warnings enabled.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • Your proposed change would possibly break existing code. `template f(T &t) { t.g(); } struct StillWorking { void g() {} }; struct NowBroken { static void g(); } }; f(StillWorking()); f(NowBroken());` – Christian Hackl Feb 12 '15 at 17:02
  • 4
    "It should be impossible to use an instance reference to refer to a static member." That ship has sailed a long, long time ago. – T.C. Feb 12 '15 at 17:04
  • P.S.: I meant to write an example with `const&` (mine won't compile anyway), but the point stands. – Christian Hackl Feb 12 '15 at 17:05
  • @T.C. yes, it has sailed, but am I not entitled to have the opinion that it should never have been this way? – Mike Nakis Feb 12 '15 at 17:49
  • @ChristianHackl would you be satisfied if I had written "It should have been impossible" instead of "It should be impossible" ? – Mike Nakis Feb 12 '15 at 17:50
  • @MikeNakis: I'd still not be sure if such a restriction would be justified in the face of generic code like the one I posted (the kind of which is not uncommon with standard algorithm functors, e.g. `std::copy_if`) . P.S: FWIW, I did not downvote your answer. – Christian Hackl Feb 12 '15 at 17:58
  • A static member is still a member, so `this->foo` is OK whether it's a static member or non-static member. This is an intentional design choice, not a bug. – Jonathan Wakely Feb 13 '15 at 18:29
  • Oh, I think I get it now, I think I know what will please all you guys. How about, if I replace "a bug in the specification" with "an ill-conceived and unfortunate design choice in the specification". Yeah, that should get you covered. – Mike Nakis Feb 13 '15 at 18:42
  • Clear and concise answer to OP's question. +1 from me. – ManuelH Oct 29 '15 at 17:06