89

Let's imagine we have a struct for holding 3 doubles with some member functions:

struct Vector {
  double x, y, z;
  // ...
  Vector &negate() {
    x = -x; y = -y; z = -z;
    return *this;
  }
  Vector &normalize() {
     double s = 1./sqrt(x*x+y*y+z*z);
     x *= s; y *= s; z *= s;
     return *this;
  }
  // ...
};

This is a little contrived for simplicity, but I'm sure you agree that similar code is out there. The methods allow you to conveniently chain, for example:

Vector v = ...;
v.normalize().negate();

Or even:

Vector v = Vector{1., 2., 3.}.normalize().negate();

Now if we provided begin() and end() functions, we could use our Vector in a new-style for loop, say to loop over the 3 coordinates x, y, and z (you can no doubt construct more "useful" examples by replacing Vector with e.g. String):

Vector v = ...;
for (double x : v) { ... }

We can even do:

Vector v = ...;
for (double x : v.normalize().negate()) { ... }

and also:

for (double x : Vector{1., 2., 3.}) { ... }

However, the following (it seems to me) is broken:

for (double x : Vector{1., 2., 3.}.normalize()) { ... }

While it seems like a logical combination of the previous two usages, I think this last usage creates a dangling reference while the previous two are completely fine.

  • Is this correct and Widely appreciated?
  • Which part of the above is the "bad" part, that should be avoided?
  • Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?
iammilind
  • 68,093
  • 33
  • 169
  • 336
ndkrempel
  • 1,906
  • 14
  • 18
  • For some reason I recall a very similar question being asked before, forgot what it was called though. – Pubby May 15 '12 at 03:31
  • I consider this a language defect. The life of temporaries is not extended to the entire body of the for-loop, but only for setup of the for loop. It's not just the range syntax which suffers, the classic syntax does as well. In my opinion the life of temporaries in the init statement should extend for the full life of the loop. – edA-qa mort-ora-y May 15 '12 at 03:54
  • 1
    @edA-qamort-ora-y: I tend to agree that there is a slight language defect lurking in here, but I think it's specifically the fact that lifetime extension happens implicitly whenever you directly bind a temporary to a reference, but not in any other situation - this seems like a half-baked solution to the underlying problem of temporary lifetimes, although that's not to say it's obvious what a better solution would be. Perhaps an explicit 'lifetime extension' syntax when constructing the temporary, which makes it last until the end of the current block - what do you think? – ndkrempel May 15 '12 at 04:17
  • @edA-qamort-ora-y: ... this amounts to the same thing as binding the temporary to a reference, but has the advantage of being more explicit for the reader that 'lifetime extension' is occurring, inline (in an expression, rather than requiring a separate declaration), and not requiring you to name the temporary. – ndkrempel May 15 '12 at 04:24
  • 1
    possible duplicate of [temporary object in range-based for](http://stackoverflow.com/questions/10153658/temporary-object-in-range-based-for) – ildjarn May 15 '12 at 15:41
  • @ildjarn: That is indeed a specific instance of the problem I'm describing. Amazing that it's already occurred in real code! – ndkrempel May 15 '12 at 16:38
  • Such concerns are worth raising. Somewhat similar feature in C# caused public outcry that lead to a breaking change, see http://stackoverflow.com/questions/5438307/detailed-explanation-of-variable-capture-in-closures and http://stackoverflow.com/questions/8898925/is-there-a-reason-for-cs-reuse-of-the-variable-in-a-foreach – Yulia V Sep 09 '14 at 19:01

3 Answers3

64

Is this correct and Widely appreciated?

Yes, your understanding of things is correct.

Which part of the above is the "bad" part, that should be avoided?

The bad part is taking an l-value reference to a temporary returned from a function, and binding it to an r-value reference. It is just as bad as this:

auto &&t = Vector{1., 2., 3.}.normalize();

The temporary Vector{1., 2., 3.}'s lifetime cannot be extended because the compiler has no idea that the return value from normalize references it.

Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?

That would be highly inconsistent with how C++ works.

Would it prevent certain gotchas made by people using chained expressions on temporaries or various lazy-evaluation methods for expressions? Yes. But it would also be require special-case compiler code, as well as be confusing as to why it doesn't work with other expression constructs.

A much more reasonable solution would be some way to inform the compiler that the return value of a function is always a reference to this, and therefore if the return value is bound to a temporary-extending construct, then it would extend the correct temporary. That's a language-level solution though.

Presently (if the compiler supports it), you can make it so that normalize cannot be called on a temporary:

struct Vector {
  double x, y, z;
  // ...
  Vector &normalize() & {
     double s = 1./sqrt(x*x+y*y+z*z);
     x *= s; y *= s; z *= s;
     return *this;
  }
  Vector &normalize() && = delete;
};

This will cause Vector{1., 2., 3.}.normalize() to give a compile error, while v.normalize() will work fine. Obviously you won't be able to do correct things like this:

Vector t = Vector{1., 2., 3.}.normalize();

But you also won't be able to do incorrect things.

Alternatively, as suggested in the comments, you can make the rvalue reference version return a value rather than a reference:

struct Vector {
  double x, y, z;
  // ...
  Vector &normalize() & {
     double s = 1./sqrt(x*x+y*y+z*z);
     x *= s; y *= s; z *= s;
     return *this;
  }
  Vector normalize() && {
     Vector ret = *this;
     ret.normalize();
     return ret;
  }
};

If Vector was a type with actual resources to move, you could use Vector ret = std::move(*this); instead. Named return value optimization makes this reasonably optimal in terms of performance.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Thank you. I appreciated your two alternative suggestions for dealing with this issue. – ndkrempel May 15 '12 at 03:52
  • 1
    The thing that might make this more of a "gotcha" is that the new for loop is syntactically hiding the fact that reference binding is going on under the covers - i.e. it's a lot less blatant than your "just as bad" examples above. That's why it seemed plausible to suggest the extra lifetime extension rule, just for the new for loop. – ndkrempel May 15 '12 at 03:57
  • To further belabour the point: I don't think you could have the same problem with an old-style for loop without *explicitly* doing some binding of temporaries to references - which would hopefully set off alarm bells in your head. – ndkrempel May 15 '12 at 04:00
  • 1
    @ndkrempel: Yes, but if you're going to propose a language feature to fix this (and therefore have to wait until 2017 at least), I would prefer if it were more comprehensive, something that could solve the temporary extension problem *everywhere*. – Nicol Bolas May 15 '12 at 04:03
  • Quite right! Do you know of any proposals or ideas for such a comprehensive thing? – ndkrempel May 15 '12 at 04:05
  • @ndkrempel: I haven't seen any. [The committee keeps a list of the various proposals that are submitted.](http://www.open-std.org/JTC1/SC22/WG21/docs/papers/) – Nicol Bolas May 15 '12 at 04:17
  • 3
    +1. On the last approach, rather than `delete` you could provide an alternative operation that returns an rvalue: `Vector normalize() && { normalize(); return std::move(*this); }` (I believe that the call to `normalize` inside the function will dispatch to the lvalue overload, but someone should check it :) – David Rodríguez - dribeas May 15 '12 at 13:39
  • 3
    I have never seen this `&`/`&&` qualification of methods. Is this from C++11 or is this some (maybe widespread) proprietary compiler extension. Gives interresting possibilities. – Christian Rau May 15 '12 at 14:08
  • 1
    @ChristianRau: It's new to C++11, and analogous to C++03 "const" and "volatile" qualifications of non-static member functions, in that it is qualifying "this" in some sense. g++ 4.7.0 does not support it however. – ndkrempel May 15 '12 at 16:35
  • 1
    Overloading it so that in the rvalue case, "normalize" returns a prvalue seems to be an acceptable solution. Pity that you cannot access the value category of the object expression in a member function call to determine the return type automatically without overloads. – Johannes Schaub - litb May 17 '12 at 12:34
  • What about initializer lists? Are an exceptional case? a for like `for( int i : {1,2,3,4} )` is valid. Is not that initializer_list an rvalue? – Manu343726 Jul 15 '14 at 14:54
  • 1
    Rather than deleting the rvalue overload I'd make it returning the vector by value. It's three members after all. And for bigger objects you should be able to move out the resources so its cheap anyway. – gigabytes May 26 '15 at 17:29
25

for (double x : Vector{1., 2., 3.}.normalize()) { ... }

That is not a limitation of the language, but a problem with your code. The expression Vector{1., 2., 3.} creates a temporary, but the normalize function returns an lvalue-reference. Because the expression is an lvalue, the compiler assumes that the object will be alive, but because it is a reference to a temporary, the object dies after the full expression is evaluated, so you are left with a dangling reference.

Now, if you change your design to return a new object by value rather than a reference to the current object, then there would be no issue and the code would work as expected.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • 1
    Would a `const` reference extend the lifetime of the object in this case? – David Stone May 15 '12 at 03:32
  • 5
    Which would break the clearly desired semantics of `normalize()` as a mutating function on an existing object. Thus the question. That a temporary has an "extended lifespan" when used for the specific purpose of an iteration, and not otherwise, is I think a confusing misfeature. – Andy Ross May 15 '12 at 03:32
  • Your suggestion in the last paragraph is an API change that impacts other code (e.g. v.normalize(); as a standalone statement). So are you saying the interface design that allows mutating methods to be chained is itself a flawed design? – ndkrempel May 15 '12 at 03:33
  • @AndyRoss: Is it not that the "extended lifespan" applies whenever a temporary is directly bound to a reference, rather than iteration specifically? – ndkrempel May 15 '12 at 03:34
  • In this last case, the usage of Vector is flawed than allowing the mutating methods to be chained. – Jagannath May 15 '12 at 03:39
  • 2
    @AndyRoss: Why? *Any* temporary bound to an r-value reference (or `const&`) has its lifetime extended. – Nicol Bolas May 15 '12 at 03:42
  • @BenVoigt: There is no problem in using the range-based for with a temporary, the problem is with using an *lvalue* that is actually a reference to a temporary. The difference is that the compiler can do its *magic* to extend the lifetime of the temporary if it *knows* that it is an *rvalue*, but it cannot if it sees an *lvalue* (how can the compiler know whether the reference is to that temporary or to another completely different object). As a matter of fact, boost::foreach library already dealt with this without compiler support! – David Rodríguez - dribeas May 15 '12 at 03:45
  • 2
    @ndkrempel: Still, not a limitation of the range-based for loop, the same issue would come if you bind to a reference: `Vector & r = Vector{1.,2.,3.}.normalize();`. Your design has that limitation, and that means that either you are willing to return by value (which might make sense in many circumstances, and more so with *rvalue-references* and *move*), or else you need to handle the problem at the place of call: create a proper variable, then use it in the for loop. Also note that the expression `Vector v = Vector{1., 2., 3.}.normalize().negate();` creates *two* objects... – David Rodríguez - dribeas May 15 '12 at 03:49
  • @DavidRodríguez-dribeas: Indeed this is an accurate description of the language semantics. My concern is more for end-users who aren't necessarily well-versed in the intricacies of lifetime extensions of temporaries etc. Could it not be considered a flaw of the language if it allows users so easily to fall into the above dangling-reference trap, when it seems a natural combination of two things that are separately valid? The insidious part is that the for-loop is hiding the reference binding that is going on... – ndkrempel May 15 '12 at 03:51
  • ... if you are willing to have the extra copy, add it: `for( double x : Vector( Vector{1.,2.,3.}.normalize() ) )` --which would be equivalent to that example. In both cases, you need to use your design *knowing* what the operations mean and the limitations. – David Rodríguez - dribeas May 15 '12 at 03:51
  • @ndkrempel: Again, if you are concern, reconsider your design. The design you propose offers simple chaining (i.e. a more stream lined syntax for method chaining) and the cost is that it is easier to misuse the feature and cause undefined behavior. Consider the tradeoffs and either change the interface or be very explicit with the documentation... – David Rodríguez - dribeas May 15 '12 at 03:56
  • @ndkrempel: I have always considered the fact that the lifetime was extended for `const&` but not for `&` a very awkward "commitee-like" decision. It ressemble a compromise. I would have preferred to settle for no extension at all, and making it impossible to call a method on a temporary for consistency. This would perhaps entail a bit more work, but there would not be any corner case. – Matthieu M. May 15 '12 at 06:53
  • @MatthieuM.: I think the lifetime is extended for any type of reference, const or not. (I'm going by N3291, section 12.2/5). Ah but you won't be able to bind to non-const & I suppose. So perhaps you should distinguish between the binding rules for temporaries and the lifetime extension rules. – ndkrempel May 15 '12 at 12:15
  • @MatthieuM.: ...I guess I should say "binding rules for rvalues" vs "lifetime extension rules for temporaries" – ndkrempel May 15 '12 at 12:28
  • @MatthieuM.: Allowing non-const references to extend the lifetime of the temporary would not help here either. The problem is that the reference is not *bound* to the temporary, but rather to a reference returned from a function. The compiler cannot *guess* that the returned object will be the `*this`. If `normalize()` returned a `const&`, the same issue would appear. Also note that the language forbids binding a non-const lvalue reference to a temporary, so that type of binding could not possibly extend the lifetime. – David Rodríguez - dribeas May 15 '12 at 13:30
  • @ndkrempel: 12.2/5 might be slightly confusing. It says *any* reference, but the rest of the language forbids you from binding a non-const lvalue reference to a temporary, so the combination actually means bound by const reference (C++03) or possibly *rvalue-reference* (C++11). If you read the whole section you will also realize that a reference that is the return of a function does not extend the lifetime, and that is the particular case we are talking about. – David Rodríguez - dribeas May 15 '12 at 13:32
  • @DavidRodríguez-dribeas: actually, I am all for not extending lifetime *at all* and not being able to use methods on temporaries. This would clean up the mess of arbitrary decisions regarding when it may bind, or not, when the lifetime is extended (or not) and the distinction between methods and functions (which is rather strange when good coding style advocate using free-functions !!). Lifetime automatic extension introduce many bugs that are not so easy to detect. It would make the code slightly less "pretty", but much sounder. – Matthieu M. May 15 '12 at 13:40
  • @MatthieuM. I am not so sure that the approach you mention would not be much worse. Consider a type for which you overload operators: `a+b+c` if `operator+` is a member method, then with your approach `operator+` could not be called on the result of `a+b` (if it was a member function). Once that is allowed the issue in this question is back (regardless of whether you remove the lifetime extension). – David Rodríguez - dribeas May 15 '12 at 15:10
  • @DavidRodríguez-dribeas: `operator+` should then be a free-function that takes its elements by value. Problem solved ;) *Sarcastic, but really binary operators are better coded as free-functions anyway* – Matthieu M. May 15 '12 at 15:15
  • @MatthieuM.: :) The issue is then the cost of `operator+` on potentially large objects (where the two arguments would be copied and a third object created for the return). I agree that this can be a source of confusion and errors, but the alternatives may be worse. The intermediate alternative (current idiomatic solution) of passing the arguments by const reference requires the possibility of binding a const-reference to a temporary. I don't think that this requires lifetime extension, so compilers could warn then the reference is bound in an expression that would yield to a dangling reference – David Rodríguez - dribeas May 15 '12 at 15:35
  • @DavidRodríguez-dribeas: Actually, the language does not forbid you from binding a non-const lvalue reference to a *temporary*, it forbids you from binding a non-const lvalue reference to an *rvalue*. The very point that you're making to MatthieuM is that the language doesn't know what is a temporary and what isn't in general. – ndkrempel May 15 '12 at 15:41
  • @DavidRodríguez-dribeas: Unless there are two distinct usages of the word 'temporary' here - one meaning 'a temporary that the compiler knows is a temporary' as opposed to a temporary object in general. But I thought the latter usage was the correct one? – ndkrempel May 15 '12 at 15:45
  • Could be time to move this to chat – Kev May 15 '12 at 15:49
  • 1
    @DavidRodríguez-dribeas: the problem with binding to const-reference is this: `T const& f(T const&);` is completely fine. `T const& t = f(T());` is completely fine. And then, in another TU you discover that `T const& f(T const& t) { return t; }` and you cry... If `operator+` operates on values, it is **safer**; then the compiler may optimize the copy out (Want Speed ? Pass by Values), but that's a bonus. The only binding of temporaries I would allow is binding to r-values references, but functions should then return values for safety and rely on Copy Elision / Move Semantics. – Matthieu M. May 15 '12 at 15:50
  • @MatthieuM.: *Want speed? pass by value* makes sense when you *need* to copy, but in many cases you do not need to copy, and then passing by value makes unnecessary copies. That is, the idiomatic `opX` would be `T opX( T l, T const & r ) { l X= r; return l; }`, but note that the *efficient* version does not take all elements by value. Also, the compiler might not be able to optimize in most cases, as that would change the calling convention and would break the ABI within a single program! – David Rodríguez - dribeas May 15 '12 at 16:07
  • @ndkrempel: I was not precise enough, and yet the statement holds: you cannot bind a non-const reference to an rvalue, and while you can bind it to an lvalue that might refer to a temporary, those cases are exempt of extending the lifetime of the temporary. The only case where the lifetime can be extended is when binding a `const&`. I have gone through this part of the standard in the past, and I recommend that you do if you still think that you can *legally* (i.e. no VS extensions) extend the lifetime with anything other than a const reference. – David Rodríguez - dribeas May 15 '12 at 16:10
  • @DavidRodríguez-dribeas: I never claimed you could legally extend the lifetime other than with a const reference. (In fact, you can - with a non-const rvalue reference.) I was just clarifying the usage of "temporary" vs "rvalue". – ndkrempel May 15 '12 at 16:29
  • @ndkrempel: I don't want to argue but this is from one of your comments: "*I think the lifetime is extended for any type of reference, const or not.*" which is the reason for the whole *you cannot extend it with a non-const reference* argument. Yes, I left rvalue-references aside, but still, non-const lvalue references are not an option (and the same paragraph that you mention was present in C++03) – David Rodríguez - dribeas May 15 '12 at 19:16
  • @DavidRodríguez-dribeas: You omitted the following sentence from your quote "Ah but you won't be able to bind to non-const & I suppose". (At the time, MatthieuM was conflating binding with lifetime extension, which didn't make the conversation very clear.) Besides, how do you think I managed to deduce "I think this last usage creates a dangling reference" in the original question, if I didn't understand the mechanics of lifetime extension? – ndkrempel May 15 '12 at 23:15
  • @DavidRodríguez-dribeas: The key point about lifetime extension that noone seems to have stated explicitly, is that it's not to do with being a "temporary" or being an "rvalue", it specifically occurs *when you bind a temporary to a reference at the point of construction of the temporary* - since that's the only time the compiler understands the ownership of the temporary. This was always clear to me, and apparently to you too, but perhaps not everyone else. – ndkrempel May 16 '12 at 00:00
  • @DavidRodríguez-dribeas: And the key issue here is that **point of construction** part. The const/non-const issue is a bit of a red herring - the language could perfectly well allow you to bind rvalues to non-const&, and the issue described in my question would still occur. – ndkrempel May 16 '12 at 00:12
  • @ndkrempel: Right, because the issue is, as I pointed out, with the design. Your design allows for easy method chaining at the cost of allowing that pitfall. If you compile with VS it will allow you to bind with a non-const reference, but as I already said, that is not the issue, but the fact that the compiler will not extend the lifetime if the reference is bound in the return statement of a function §12.2p5/3, but I am repeating myself here. Also note that in your code, the reference that is bound to the *temporary* is the return statement, not the calling site that bounds that reference. – David Rodríguez - dribeas May 16 '12 at 00:56
4

IMHO, the second example is already flawed. That the modifying operators return *this is convenient in the way you mentioned: it allows chaining of modifiers. It can be used for simply handing on the result of the modification, but doing this is error-prone because it can easily be overlooked. If I see something like

Vector v{1., 2., 3.};
auto foo = somefunction1(v, 17);
auto bar = somefunction2(true, v, 2, foo);
auto baz = somefunction3(bar.quun(v), 93.2, v.qwarv(foo));

I wouldn't automatically suspect that the functions modify v as a side-effect. Of course, they could, but it would be confusing. So if I was to write something like this, I would make sure that v stays constant. For your example, I would add free functions

auto normalized(Vector v) -> Vector {return v.normalize();}
auto negated(Vector v) -> Vector {return v.negate();}

and then write the loops

for( double x : negated(normalized(v)) ) { ... }

and

for( double x : normalized(Vector{1., 2., 3}) ) { ... }

That is IMO better readable, and it's safer. Of course, it requires an extra copy, however for heap-allocated data this could likely be done in a cheap C++11 move operation.

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • Thanks. As usual, there are many choices. One situation where your suggestion may not be viable is if Vector is an array (not heap allocated) of 1000 doubles, for example. A trade-off of efficiency, ease-of-use, and safety-of-use. – ndkrempel May 15 '12 at 15:34
  • 2
    Yes, but it's seldom useful to have structures with size > ≈100 on the stack, anyway. – leftaroundabout May 15 '12 at 15:46