33

In the following code a member function set() is called on a model, which is a null pointer. This would be undefined behavior. However, the parameter of the member function is a result of another function call that checks whether the model is a null pointer and throws in that case. Is it guaranteed that estimate() will always be called before the model is accessed or is it still Undefined Behaviour (UB)?

#include <iostream>
#include <memory>
#include <vector>


struct Model
{
    void set(int x)
    {
        v.resize(x);
    }

    std::vector<double> v;
};


int estimate(std::shared_ptr<Model> m)
{
    return m ? 3 : throw std::runtime_error("Model is not set");
}

int main()
{
    try
    {
        std::shared_ptr<Model> model; // null pointer here
        model->set(estimate(model));
    }
    catch (const std::runtime_error& e)
    {
        std::cout << e.what();
    }

    return 0;
}
Jason
  • 36,170
  • 5
  • 26
  • 60
mentalmushroom
  • 2,261
  • 1
  • 26
  • 34
  • 6
    Well the `estimate` call is guaranteed to be called first. Which means the exception will be thrown before the actual null-pointer dereference happens. Is the code still UB even when the code causing the UB does not execute? That's an interesting question... :) – Some programmer dude May 09 '23 at 08:33
  • 1
    Is your question purely out of curiosity or is it based on real code? I wouldn't organize my code relying on the evaluation order... – Mickaël C. Guimarães May 09 '23 at 08:34
  • If `model` is an argument to `estimate`, how do you want the function to be called *before the argument is accessed* ? Or what do you mean with "accessed" ? – Yves Daoust May 09 '23 at 08:36
  • 6
    I think it's perfectly reasonable to write code based on the fact that function arguments will be evaluated before the function itself. – john May 09 '23 at 08:36
  • @MickaëlC.Guimarães It is based on a real code. I didn't want to split the call into two lines and it made me curious whether this is safe. – mentalmushroom May 09 '23 at 08:36
  • @mentalmushroom It's safe. Languages which delay evaluation until the last possible moment (so called lazy evaluation) do exist, but C++ is not one of them. – john May 09 '23 at 08:38
  • @YvesDaoust accessing `model` is ok, it a valid `shared_ptr`, only dereferencing it is not ok because it does not point to a valid pointee – 463035818_is_not_an_ai May 09 '23 at 08:40
  • 1
    @463035818_is_not_a_number: hence my comment. – Yves Daoust May 09 '23 at 08:41
  • @YvesDaoust I agree that the question could be phrased better. The text refereing to `model` as if it would be a raw pointer is also a tiny bit confusing – 463035818_is_not_an_ai May 09 '23 at 08:42
  • 1
    @463035818_is_not_a_number Would it make a difference if there were a raw pointer? – mentalmushroom May 09 '23 at 08:57
  • 5
    FWIW, address sanitizer says `->` is evaluated anyway https://godbolt.org/z/n13P18M3G. Also, using raw pointers (which should basically be the same issue here) gives the same diagnosis https://godbolt.org/z/TPT7Y89cG – cigien May 09 '23 at 09:25
  • @mentalmushroom I suppose not, but thats part of the answer. If your code would be problematic, then it would make a difference if `model` was a raw pointer. edit: only now I see the answer. It does make a (tiny) difference – 463035818_is_not_an_ai May 09 '23 at 09:29
  • @john: if `model->set()` were a virtual function call, there's some dereferencing work to get to the actual function pointer. Of course, overlapping that work with setting up the args would still be possible per the as-if rule in cases where arg setup didn't maybe throw, even if the standard did define that args were evaluated first. So the actual wording in the standard seems somewhat backwards in making this UB and unsafe on paper. Perhaps they were thinking of cases where the postfix-expression is non-trivial, like `fptr[++i]( 4*i )` to iterate through an array of function pointers. – Peter Cordes May 10 '23 at 03:17

3 Answers3

28

This is still undefined behavior (UB) as per expr.compound:

The postfix-expression is sequenced before each expression in the expression-list and any default argument. The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter.

(emphasis mine)

This means that the postfix expression model->set is sequenced before the expression estimate(model) in the expression-list. And since model is null pointer, the precondition of std::shared_ptr::operator-> is violated and hence this leads to UB.


Jason
  • 36,170
  • 5
  • 26
  • 60
  • 1
    So if I get this right since `model->` is equivalent to `(*model).` it means that `model` is still dereferenced and so this is still undefined behavior. – Val May 09 '23 at 09:27
  • 1
    It's not quite clear how UB follows from this statement. The postfix expression before the arrow is evaluated, which results in a null pointer. Together with the id-expression it determines the result of the entire expression. Ok, but does it dereference the null pointer to determine it? – mentalmushroom May 09 '23 at 10:25
  • 3
    @mentalmushroom if `set` is `virtual` the pointer has to actually be dereferenced to dispatch the call based on the dynamic type of the object. This case is not singled out in the standard (even though a non-`virtual` function *could* be dispatched based only on the static type of `model`), therefore the general case is that the pointer *has* to be dereferenceable. – Quentin May 09 '23 at 13:45
  • 6
    @mentalmushroom Your use of `std::shared_ptr` makes this pretty clear-cut, because the `std::shared_ptr::operator->` overload has a precondition per library specification that the pointer isn't null. The call to _that_ function therefore already has undefined behavior, regardless of the evaluation of `->set` on the result. Whether the same would be true with a raw null pointer is a different question. – user17732522 May 09 '23 at 14:46
  • @user17732522 I've added some second part in my answer that uses [expr.ref](https://timsong-cpp.github.io/cppwp/n4868/expr.compound#expr.ref-1). Basically to my understanding of [expr.ref]: *"the postfix expression `model` before the arrow is evaluated and the result of that evaluation(which is `nullptr`) together with the id-expression `operator->` is used to determine the result of the entire postfix expression."* This leads to UB as the precondition is violated. Is the conclusion and analysis using [expr.ref] correct? – Jason May 09 '23 at 15:05
  • @user17732522 That's true, but is it really called in this case? The rule says "the postfix expression BEFORE the arrow is evaluated". It does not clearly say whether the id-expression is evaluated. Or does the word "determines" imply that it is evaluated? – mentalmushroom May 09 '23 at 17:13
  • @mentalmushroom The _postfix-expression_ in the quote above when applied to your example would be the _postfix-expression_ of the _function call expression_ (see the context of the quote), so it is the whole of `model->set`. Per https://eel.is/c++draft/over.match.oper#2 the `model->` part is equivalent to `model.operator->()`, which already constitutes the call to `operator->()` as part of the evaluation of `model->set`. How the actual member function `set` relates is irrelevant. – user17732522 May 09 '23 at 17:29
  • @Jason What's written in [expr.compound] explicitly applies only to the built-in operators unless otherwise stated (https://eel.is/c++draft/expr#pre-3.sentence-1). I think you should rather quote [over.match.oper] instead of [expr.ref] if you want to describe behavior of overloaded `->`. – user17732522 May 09 '23 at 17:35
  • @mentalmushroom I meant `model.operator->()->` in the above and also I should reference https://eel.is/c++draft/over.match.oper#12 as well. – user17732522 May 09 '23 at 17:39
  • @user17732522 I agree that [expr.compound] is for built in operators unless other wise stated. But [expr.pre] also says that : *"Overloaded operators obey the rules for syntax and **evaluation order** specified in [expr.compound]"*. And since [expr.ref#2] that I quoted in second part of my answer, talks about evaluation and so should be applicable, isn't it? – Jason May 10 '23 at 04:10
  • @Jason That's a note. The normative part is in https://eel.is/c++draft/over.match.oper#2.sentence-4. It refers back to [expr.compound] only for the evaluation order of the operands. That `a` in `(a).operator->( )` is evaluated (by application of [expr.ref] to `.`) is in my opinion not really relevant. The UB is due to the call expression `postfix-expression()` where `postfix-expression` is `(a).operator->()`. – user17732522 May 10 '23 at 09:22
  • @user17732522 I think you missed a `->` at the end of your last comment. In particular, `postfix-expression()` should have been `postfix-expression->()` . Basically, since `model->set(estimate(model))` is the same as `model.operator->()->set(estimate(model))`, so according to [expr.compound] the postfix expression `model.operator->()->set` is sequenced before the expression in the expression list. This leads to UB due to precondition being violated. – Jason May 10 '23 at 11:45
  • @Jason I meant "_where postfix-expression is `(a).operator->`"_, but I don't think we are really disagreeing about anything. – user17732522 May 10 '23 at 14:23
10

To my understanding this is undefined behavior at least from C++17:

  1. In a function-call expression, the expression that names the function is sequenced before every argument expression and every default argument.

As I interpret this, it actually guarantees that model->set is evaluated before any argument and thus invokes undefined behavior. It does not matter whether or not model is a raw pointer.

nielsen
  • 5,641
  • 10
  • 27
  • 1
    But `model->set` names a member function and not a function so this doesn't seem to apply. The question title also explicitly suggests that OP is interested in the member function case and is already aware of non member function case. – Val May 09 '23 at 09:32
  • 1
    @Val So a member function is not a function? What do you base that statement on? – nielsen May 09 '23 at 09:36
  • 2
    Function call and member function call are two different things. – Val May 09 '23 at 09:39
  • 5
    @Val This is cppreference, not the Standard. This text is written for readability, not ultimate precision. The [actual rule](https://timsong-cpp.github.io/cppwp/n4868/expr.compound#expr.call-8) says that the *postfix-expression* is sequenced before the argument initializations. The rule cited from cppreference does apply here. – j6t May 09 '23 at 10:18
  • 4
    @Val Even in the Standard I believe that the word "function" covers member functions as well as non-member functions and explicit distinction is made when applicable. Generally speaking, a "member function call" is a special case of a "function call". – nielsen May 09 '23 at 11:01
  • "_model->set is evaluated_": That's obviously UB for the `std::shared_ptr` case as explained in the other answer, but it is non-obvious whether it is UB for the raw pointer case. Generally, dereferencing of a null pointer itself is supposed to be allowed and I don't see anything obvious in https://eel.is/c++draft/expr.ref that would make it UB. There are open CWG issues on the exact limitations. – user17732522 May 09 '23 at 14:57
  • @user17732522 "dereferencing of a null pointer" gives you an invalid lvalue. If you try to take the address, you should be safe getting back a null pointer. Just about anything else (in particular lvalue-to-rvalue conversion, but also member access, assignment, etc) is UB. – Ben Voigt May 09 '23 at 21:52
  • @BenVoigt That's supposed to be the idea, but https://eel.is/c++draft/expr.unary.op#1.sentence-3 still doesn't handle this special case and so doesn't define the result of indirection at all and the best thing that I could find in [expr.ref] about whether or not `model->set` would then have undefined behavior is https://eel.is/c++draft/expr.ref#8, which isn't actually covering the case if read literally. Other than that, assuming indirection itself was allowed, I am not what would make the member access itself UB. The call surely would be, but it is never evaluated in OP's example. – user17732522 May 09 '23 at 21:59
  • @user17732522: The horrible wording in https://eel.is/c++draft/expr.prim.id.general#3.1 may be part of the problem. "expression refers to the member's class" is meaningless garbage. If it said either "the static type of the expression" or "the dynamic type of the object to which the expression refers", it would be much clearer. – Ben Voigt May 09 '23 at 22:10
  • @user17732522 You are right the Standard is remarkably unclear about this. It deserves a (recent) question by itself, but for short: the result of `model->set` when `model` is a null pointer is afaik not defined in the Standard and thus falls under https://eel.is/c++draft/defns.undefined. Another way to see it, if a compiler implemented arbitrary behavior in this case, could it be said to be non-compliant because of that? – nielsen May 10 '23 at 07:49
  • @nielsen I agree with the conclusion. It's just that I think that part in particular was the non-obvious part that OP was interested in, even if they made the mistake of also introducing UB via precondition violation of `std::shared_ptr::operator->`. – user17732522 May 10 '23 at 09:10
7

[expr.call]/7:

The postfix-expression is sequenced before each expression in the expression-list and any default argument.

In this case, this means that model->set is evaluated before estimate(model).

Since model is a shared_ptr<Model>, model->set uses shared_ptr's overloaded operator->, which has the following precondition ([util.smartptr.shared.obs]/5):

Preconditions: get() != nullptr.

Violating this precondition results in undefined behavior ([structure.specifications]/3.3).

duck
  • 1,455
  • 2
  • 8