8

Consider this code, which is slightly modified from here:

#include <iostream>

void foo() {
    int i;
    static auto f = [&i]() { std::cout << &i << "\n";};
    f();
}

int main() {
    foo();
    foo();
}

The lambda f is initialized only on the first call, during the second call the captured variable ceased to exists, the lambda holds a dangling reference, but only prints its address. No obvious issue with gcc and output looks ok:

0x7ffc25301ddc
0x7ffc25301ddc

Is it undefined behavior to take the address of a dangling reference, or is it ok?

For a very similar example gcc ( -Wall -Werror -pedantic -O3) produces a warning:

#include <iostream>

auto bar() {
    int i;
    return [&i]() {std::cout << &i << "\n"; };
}

int main() {
    bar()();
}

warning:

source>:5:14: error: address of stack memory associated with local variable 'i' returned [-Werror,-Wreturn-stack-address]
    return [&i]() {std::cout << &i << "\n"; };
             ^
<source>:5:14: note: captured by reference here
    return [&i]() {std::cout << &i << "\n"; };

Of course, the fact that gcc compiles the first example and produces expected(?) output while warns for the second does not mean a thing. Where in the standard I can find whether using the address of a dangling reference is fine or not?

PS: I suppose the answer is somewhere in [basic.life], though I was browsing it several times, but I have a hard time to see what applies and what it is trying to tell me.

Evg
  • 25,259
  • 5
  • 41
  • 83
463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • FWIW, it seems to me that it's an oversight that capturing a local in a static lambda, like you do here, doesn't produce at least a warning. – 500 - Internal Server Error Sep 13 '21 at 21:12
  • https://timsong-cpp.github.io/cppwp/n4868/basic.stc.general#4 is silent about what happens to references, so I'd say it is underspecified – Language Lawyer Sep 13 '21 at 21:43
  • @500-InternalServerError yes I also think its an oversight. Actually I was trying to provoke the optimizer to produce some unexpected output, but that warning was the "best" I was able to get. – 463035818_is_not_an_ai Sep 14 '21 at 08:05

2 Answers2

1

I believe this is poorly specified, but may be implementation-defined.

The question and the other answer presumes that i is a dangling reference. That presumes that it is a reference at all. But that's not correct!

It is notable that a reference capture is not a reference. The standard explicitly and intentionally says that a reference capture may not result in non-static data members of the closure type. [expr.prim.lambda/12]:

It is unspecified whether additional unnamed non-static data members are declared in the closure type for entities captured by reference.

That's why the rewriting for entity names only happens to copy captures. [expr.prim.lambda/11]:

Every id-expression within the compound-statement of a lambda-expression that is an odr-use of an entity captured by copy is transformed into an access to the corresponding unnamed data member of the closure type.

The same is not true of reference captures. The id-expression i within the lambda body refers to the original entity. It is not, as one might reasonably assume, a non-static member of the closure type which acts as an int&.

As far as I can tell, this dates back to some rewording in N2927 before C++11. Prior to that, during standardization, reference captures apparently did result in closure type members and did trigger a rewrite in the body just as copy captures. The change was intentional.

So... the lambda body names an object i of type int which on the second invocation is not only outside its lifetime, but the storage has also been released.

With that in mind, let's try to infer if that's okay.

The standard explicitly allows using the name outside lifetime but before storage re-use. [basic.life/7]:

after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. For an object under construction or destruction, see [class.cdtor]. Otherwise, such a glvalue refers to allocated storage ([basic.stc.dynamic.allocation]), and using the properties of the glvalue that do not depend on its value is well-defined.

That doesn't actually apply, because here storage is released. However, when storage is not released, you can infer that the committee generally intends that naming entities that do not depend on the value of it are OK. In practice, mostly avoid the lvalue-to-rvalue conversion.

The standard also explicitly invalidates pointers on storage release. [basic.stc.general/4]:

When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.

We don't have a pointer. Of note, references aren't "zapped", but we don't have a reference either.

So, how do we put this together?

Is naming i alone a problem? It is explicitly allowed to name i after its lifetime but before storage release. I cannot find any prohibition against naming i after storage release. It must refer to the same object, which is outside its lifetime. In other words, the rules say i is an lvalue representing some object, and they also say that continues after the object lifetime. They do not say it stops at storage release.

Is using but not accessing i a problem? By taking the address, we do not trigger lvalue-to-rvalue conversion, and we do not "access" i. I cannot find a prohibition. The address operator ([expr.unary.op/3]) says it will return the address of the designated object, which is the object the lvalue names.

What is the result of &i? The language about pointer zapping could be read to mean that the result, which is a pointer representing the address of storage which was released, must be an invalid pointer value.

Can we print &i? The language on invalid pointer values is clear that indirection and deallocation are undefined, but everything else is implementation-defined.

So... it may be implementation-defined.

Jeff Garrett
  • 5,863
  • 1
  • 13
  • 12
  • _The language about pointer zapping could be read to mean that the result, which is a pointer representing the address of storage which was released, must be an invalid pointer value._ You mean that `&i` first produces pointer to object representing the address of released storage and this value is then immediately replaced with an invalid pointer value? – Language Lawyer Sep 15 '21 at 06:46
  • &i has type pointer to int and "points to the designated object" so it's a pointer representing the address in that region of storage. "the values of all pointers representing the address of any part of that region of storage become invalid pointer values"... So you could read that as the value needs to be the invalid pointer value after the zap, even though the pointer itself did not exist at the time of the zap. Imagine if you had a pointer value stored in a uintptr_t at the time of the zap. Should a pointer recreated from it have an invalid pointer value? I don't know for sure. – Jeff Garrett Sep 15 '21 at 12:35
  • I read «**When** the end of the duration of a region of storage is reached» (in contrast to «**After**») as that the «zap» only applies to pointers existed at the time of storage release. But this may be too much reading into the word choice. A possible fix is to say that when the storage is released, all entities (variables and NSDMs) associated with the objects in the storage become no longer associated with them. So `i` won't be a lvalue denoting the object anymore and using `i` in an expression would be just UB. – Language Lawyer Sep 15 '21 at 13:02
0

Yes, it is undefined behavior, because applying the & address-of operator to a reference retreives the address of the object being referred to, which in your example does not exist anymore as it has gone out of scope and been destroyed. You can't take the address of an object that doesn't exist.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Could you show wording making objects «not existing»? https://timsong-cpp.github.io/cppwp/n4868/intro.object#1.sentence-2 describes how objects are created, there is wording saying how their lifetime ends, but they still exist when it ends. When objects stop being existing? – Language Lawyer Sep 13 '21 at 21:44
  • "*how their lifetime ends, but they still exist when it ends*" - no, they don't. Once an object has gone out of scope and been destroyed, it no longer exists. The *memory* it occupied may still exist, but the object itself is gone. – Remy Lebeau Sep 13 '21 at 22:28
  • 2
    _no, they don't. Once an object has gone out of scope and been destroyed, it no longer exists._ I don't see a reference to the standard in your comment. (And, BTW, objects don't have scope.) – Language Lawyer Sep 13 '21 at 22:39
  • Using a reference initialized to an object after that object has been destroyed is not always forbidden (https://eel.is/c++draft/basic.life#8), so this answer is incomplete. – Jeff Garrett Sep 14 '21 at 20:36