16

Thanks to some segmentation faults and warnings in valgrind, I found that this code is incorrect and has some sort of dangling reference in the for-range loop.

#include<numeric>
#include<vector>

auto f(){
    std::vector<std::vector<double>> v(10, std::vector<double>(3));
    iota(v[5].begin(), v[5].end(), 0);
    return v;
}

int main(){
    for(auto e : f()[5])
        std::cout << e << std::endl;
    return 0;
}

It looks as if the begin and end is taken from a temporary and lost in the loop.

Of course, a way around is to do

    auto r = f()[5];
    for(auto e : r)
        std::cout << e << std::endl;

However, I wonder exactly why for(auto e : f()[5]) is an error and also if there is a better way around or some way to design f or the even the container (std::vector) to avoid this pitfall.

With iterator loops is more obvious why this problem happens (begin and end come from different temporary objects)

for(auto it = f()[5].begin(); it != f()[5].end(); ++it)

But in a for-range loop, as in the first example, it seems very easy to make this mistake.

songyuanyao
  • 169,198
  • 16
  • 310
  • 405
alfC
  • 14,261
  • 4
  • 67
  • 118
  • What is the purpose of this code: what are you trying to achieve with it? Because if it's only purpose is jagged array initialization, there are better methods. – JHBonarius Jul 20 '18 at 07:07
  • 2
    Yes, and there's a proposal for C++20 to partly address this issue: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0614r1.html – bipll Jul 20 '18 at 07:12
  • @JHBonarius, what is "jagged array initialization"? – alfC Dec 07 '18 at 19:25
  • Initialization of an array of arrays... – JHBonarius Dec 08 '18 at 20:54

2 Answers2

14

I wonder exactly why for(auto e : f()[5]) is an error

I'll just answer this part. The reason is that range-based for statements are just syntactic sugar for, approximately:

{
    auto&& __range = f()[5]; // (*)
    auto __begin = __range.begin(); // not exactly, but close enough
    auto __end = __range.end();     // in C++17, these types can be different
    for (; __begin != __end; ++__begin) {
        auto e = *__begin;
        // rest of body
    }
}

Take a look at that first line. What happens? operator[] on a vector returns a reference into that object, so __range is bound to that internal reference. But then the temporary goes out of scope at the end of the line, destroying all of its internals, and __range is immediately a dangling reference. There is no lifetime extension here, we are never binding a reference to a temporary.

In the more normal case, for(auto e : f()), we'd bind __range to f() directly, which is binding a reference to a temporary, so that temporary would get its lifetime extended to the lifetime of the reference, which would be the full for statement.

To add more wrinkles, there are other cases where indirect binding like this would still do lifetime extension. Like, say:

struct X {
    std::vector<int> v;
};
X foo();

for (auto e : foo().v) {
    // ok!
}

But rather than try to keep track of all of these little cases, it's much better to, as songyuanyao suggests, use the new for statement with initializer... all the time:

for (auto&& range = f(); auto e : range[5]) {
    // rest of body
}

Although in a way this gives a false sense of security, since if you did it twice, you'd still have the same issue...

for (auto&& range = f().g(); auto e : range[5]) {
    // still dangling reference
}
Barry
  • 286,269
  • 29
  • 621
  • 977
  • Would the syntactic sugar be "better" (for this problem) if it had `auto const& __range = f()[5]; // (*)`? – alfC Jul 20 '18 at 11:05
  • @alfC Nope. Same outcome, just you get a const lvalue reference instead of a non-const rvalue reference. You definitely want `auto&&` there, otherwise you'd only get `const` access. – Barry Jul 20 '18 at 11:37
  • ok, I was thinking that `... const&` would extend the lifetime. As for const access, that depends on the semantics of `.begin()` in principle. In practice you are right. – alfC Jul 20 '18 at 11:47
  • I question "it's much better to [use the new initializer feature]". The very clutter-free loop (which is, I think, the entire point of range-based-for loops) has suddenly become rather cluttered, and we've lost a lot of what range-based-for previously gained, in my opinion. As for "range-based-for is just syntactic sugar for... "; understood, and I am hoping it will be changed in the future to being syntactic sugar for . – Don Hatch Apr 16 '20 at 15:10
9

Note that using a temporary as the range expression directly is fine, its lefetime will be extended. But for f()[5], what f() returns is the temporary and it's constructed within the expression, and it'll be destroyed after the whole expression where it's constructed.

From C++20, you can use init-statement for range-based for loop to solve such problems.

(emphasis mine)

If range_expression returns a temporary, its lifetime is extended until the end of the loop, as indicated by binding to the rvalue reference __range, but beware that the lifetime of any temporary within range_expression is not extended.

This problem may be worked around using init-statement:

for (auto& x : foo().items()) { /* .. */ } // undefined behavior if foo() returns by value
for (T thing = foo(); auto& x : thing.items()) { /* ... */ } // OK

e.g.

for(auto thing = f(); auto e : thing[5])
    std::cout << e << std::endl;
songyuanyao
  • 169,198
  • 16
  • 310
  • 405
  • Cool. Can initial statements be used in normal for loops as well? – alfC Jul 20 '18 at 07:08
  • well yes, but not to predefine auxiliary variable of arbitrary type. – alfC Jul 20 '18 at 07:15
  • @alfC I got what you mean; it seems impossible for normal for loop. – songyuanyao Jul 20 '18 at 07:35
  • For example, it could accept (with modifications to the language), something similar to lambda captures, e.g. `for[thing = foo()](auto it = thing.begin();}; it != thing.end(); ++it)`. – alfC Jul 20 '18 at 07:40
  • @alfC Yes; and you might want to make a proposal for it. :) – songyuanyao Jul 20 '18 at 07:48
  • @alfC for(*init*; *check*; *inc*) allows *init* to be a single [simple declaration ](https://en.cppreference.com/w/cpp/language/declarations), of whatever type you like. It need not relate to *check* or *inc* – Caleth Jul 20 '18 at 08:33
  • @Caleth, yes, but you cannot initialize two variables with different types in `init`. For example the counter/iterator (`thing.begin()`) and some auxiliary variable (`thing = f()`). – alfC Jul 20 '18 at 10:09
  • Why is `for (auto& x : foo().items()) { /* .. */ }` undefined behavior. If `items()` is returning something by value, say it is a function returning a `std::vector items(){....}`, then wouldn't that vector have its lifetime extended per the lifetime extension rule? Even though `foo()` may be temporary it seems that in this case it is fine as long as `items()` is returned by value. Unless I am missing something. – cogle Jun 19 '21 at 01:40
  • @cogle You're right. The point here is that `itmes()` returns by-reference; the returned reference from the temporary returned by `foo()` becomes dangled after the temporary gets destroyed. Just like in OP's question, `operator[]` returns by-rereference too. If `items()` returns by-value, as you said it'll be fine. – songyuanyao Jun 19 '21 at 09:48