Undefined Behavior with the C++0x Closure: II

Question

I find the use of the C++0x closure perplexing. My initial report, and the subsequent one, have generated more confusion than explanations. Below I will show you troublesome examples, and I hope to find out why there is an undefined behavior in the code. All the pieces of the code pass the gcc 4.6.0 compiler without any warning.

Program No. 1: It Works

#include <iostream>
int main(){
    auto accumulator = [](int x) {
        return [=](int y) -> int { 
            return x+y;
        }; 
    };
    auto ac=accumulator(1);
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
}

The output meets the expectations:

2 2 2

2. Program No. 2: Closure, Works Fine

#include <iostream>
int main(){
    auto accumulator = [](int x) {
        return [&](int y) -> int { 
            return x+=y;
        }; 
    };
    auto ac=accumulator(1);
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
}

The output is:

4 3 2

7 6 5

10 9 8

Program 3: Program No. 1 with std::function, Works Fine

#include <iostream>
#include <functional>     // std::function

int main(){

    typedef std::function<int(int)> fint2int_type;
    typedef std::function<fint2int_type(int)> parent_lambda_type;

    parent_lambda_type accumulator = [](int x) -> fint2int_type{
        return [=](int y) -> int { 
            return x+y;
        }; 
    };

    fint2int_type ac=accumulator(1);

    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
}

The output is:

2 2 2

Program 4: Program No. 2 with std::function, Undefined Behavior

#include <iostream>
#include <functional>     // std::function

int main(){

    typedef std::function<int(int)> fint2int_type;
    typedef std::function<fint2int_type(int)> parent_lambda_type;

    parent_lambda_type accumulator = [](int x) -> fint2int_type{
        return [&](int y) -> int { 
            return x+=y;
        }; 
    };

    fint2int_type ac=accumulator(1);

    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
}

The first run of the program gives:

4 3 2

12364812 12364811 12364810

The second run of the same program:

4 3 2

1666060 1666059 1666058

The third one:

4 3 2

2182156 2182155 2182154

How does my use of the std::function break the code? why do Programs No.1 - 3 work well, and Program No. 4 is correct when calling ac(1) thrice(!)? Why does Program No. 4 get stuck on the next three cases as if the variable x had been captured by value, not reference. And the last three calls of ac(1) are totally unpredictable as if any reference to x would be lost.

Arzar · Accepted Answer · 2011-04-06T15:32:09.683

9

I hope to find out why there is an undefined behavior in the code

Every time I deal with complex and intricated lambda, I feel it more easier to do first the translation into function-object form. Because lambdas are just syntactic sugar for function-object and for each lambda there is a one-to-one mapping with a corresponding function-object. This article explain really well how to do the translation : http://blogs.msdn.com/b/vcblog/archive/2008/10/28/lambdas-auto-and-static-assert-c-0x-features-in-vc10-part-1.aspx

So for example, your program no 2 :

#include <iostream>
int main(){
    auto accumulator = [](int x) {
        return [&](int y) -> int { 
            return x+=y;
        }; 
    };
    auto ac=accumulator(1);
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
}

would be approximately translate by the compiler into this one :

#include <iostream>

struct InnerAccumulator
{
    int& x;
    InnerAccumulator(int& x):x(x)
    {
    }
    int operator()(int y) const
    {
        return x+=y;
    }
};

struct Accumulator
{
    InnerAccumulator operator()(int x) const
    {
        return InnerAccumulator(x); // constructor
    }
};


int main()
{
    Accumulator accumulator;
    InnerAccumulator ac = accumulator(1);
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
}

And now, the problem become quite obvious :

InnerAccumulator operator()(int x) const
{
   return InnerAccumulator(x); // constructor
}

Here the constructor of InnerAccumulator will take a reference to x, a local variable which will die as soon as you exit the operator() scope. So yes, you just get a plain good old undefined behavior as you suspected.

edited Apr 06 '11 at 15:32

answered Apr 06 '11 at 13:25

Arzar

13,657
3
25
25

Thank you for your explanation, at least I got the direction to work on. Trouble is, the demonstration you provide does not explain why Program 2 works fine. It implements the lexical scoping, where x lives as a state within the scope of main as it is supposed to. The problem is somewhere with the use of the std::function in Program 4, I was hoping to get some information about its internals in this particular case, but this seems to be harder than it seems. But I appreciate your answer nonetheless, and I will keep looking for the explanation. – Apr 06 '11 at 20:21
4

"the demonstration you provide does not explain why Program 2 works fine". That's **undefined behavior** for you. Program No. 2.2 use a dangling reference. It can do absolutely everything : crash immediately, crash later, throw an exception, give random result (like in Program 4), launch nuclear missiles turning the world into radioactive wasteland, etc. Even worse, it can also work ! But it's just luck, other compiler will show different behavior. For example my code with function-object "work" with GCC 4.6 (it print 4 3 2 7 6 5 10 9 8) but crash with MSVC10. (you get an access violation). – Arzar Apr 06 '11 at 23:30
I define "undefined" when the result is unpredictable. Program No. 2 passes the compiler and produces the expected outcome as x lives within the scope of main. This looks unusual for a person who is used to the block scope, but for the lexical scope programmer this is a default behavior. I would not dispute your remark that this is an example of an unsafe programming, but this is a matter of style. – Apr 07 '11 at 00:19
2

The dangling reference will "work" as long as nothing in the program messes with the storage area previously occupied by the value. In the case of program 4, std::function does some things with the stack that mess with the area pointed to by the dangling reference, but it has nothing to do with std::function as Johannes Dahlström's answer shows lots of things can cause this to happen. Most importantly: Just because it "works" and does the same thing every time does NOT mean it is not Undefined Behavior. – SoapBox Apr 07 '11 at 00:29
2

@Rusty: Unfortunately, you don't get to define what "undefined behaviour" means, as it is a technical term defined in the language standard. The standard says certain semantic constructs cause undefined behaviour, and if your program includes those constructs, the program **has a bug**. This has emphatically **nothing** to do about different programming styles, it has to do with whether your program is buggy or not! Even if it seems to "work fine" with one compiler, with certain compiler options, and when the stars are right, nothing guarantees that it will keep "working fine" tomorrow. – JohannesD Apr 07 '11 at 00:33

JohannesD · Answer 2 · 2011-04-07T08:41:01.137

Let's try something entirely innocent-looking:

#include <iostream>
int main(){
    auto accumulator = [](int x) {
        return [&](int y) -> int { 
            return x+=y;
        }; 
    };
    auto ac=accumulator(1);

    //// Surely this should be a no-op? 
    accumulator(666);
    //// There are no side effects and we throw the result away!

    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
    std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl; 
}

Tada:

669 668 667 
672 671 670 
675 674 673

Of course, this is not guaranteed behaviour either. Indeed, with optimizations enabled, gcc will eliminate the accumulator(666) call figuring it's dead code, and we again get the original results. And it is entirely within its rights to do so; in a conforming program, removing the call would indeed not affect the semantics. But in the realm of undefined behaviour, anything may happen.

EDIT

auto ac=accumulator(1);

std::cout << pow(2,2) << std::endl;

std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;
std::cout << ac(1) << " " << ac(1) << " " << ac(1) << " " << std::endl;

Without optimizations enabled, I get the following:

4
1074790403 1074790402 1074790401 
1074790406 1074790405 1074790404 
1074790409 1074790408 1074790407

With optimizations enabled,

Again, C++ does not and cannot provide true lexical closures where the lifetime of local variables would get extended beyond their original scope. That would entail bringing garbage collection and heap-based locals to the language.

This is all rather academic, though, as capturing x by copy makes the program well-defined and to work as expected:

auto accumulator = [](int x) {
    return [x](int y) mutable -> int { 
        return x += y;
    }; 
};

I understand your contention and this all very valuable to know, but what exactly is undefined in this program? You reinitialize x with the sum of the first seven squared primes, and from now on within the scope of main it accumulates the unity the way it is supposed to. If someone sends me the code, and I can tell what it is doing without even running it, and after I run the code with repetitions, I still see that it does what I expect it to do, then how can the program behavior be undefined?! Perhaps my view is too narrow, but your example does not convince me. Program 4 does though... — , Apr 07 '11 at 00:40
You seem thoroughly confused about how function parameters are supposed to work. Certainly in no other language with lexical closures can you "reinitialize" captured variables like that... JavaScript doesn't work like that. LISP doesn't work like that. Local variables of separate calls to the same function are completely distinct and cannot affect each other in *any way*. — JohannesD, Apr 07 '11 at 08:29
I would think that in theory all of the calls to `operator <<` should have the same effect as the `pow()` function or the `accumulator(666)` call, screwing up the storage that contains `x` for at least the second and third lines of code. I wonder how he got so lucky that it didn't. — Ken Bloom, Apr 08 '11 at 20:08

score 1 · Answer 3 · answered Apr 06 '11 at 13:10

1

Well, references become dangling when the referent goes away. You have a very fragile design if object A has a reference to some part of object B, unless object A in some way can guarantee the lifetime of object B (for instance, when A holds a shared_ptr to B anyway, or both are in the same scope).

References in lambda's are no magical exception. If you plan to return a reference to x+=y, you'd better make sure that x lives long enough. Here it's the argument int x initialized as part of the call accumulator(1). The lifetime of a function argument ends when the function returns.

answered Apr 06 '11 at 13:10

MSalters

173,980
10
155
350

I am delighted to receive your help, your comment is valuable and to the point. But regarding your last sentence, I must say this: The life time of a function argument may not end when the function returns, that is the main paradox of the [lexical scope](http://pierrespring.com/2010/05/11/function-scope-and-lexical-scoping/). The supporting evidence for this phenomenon is the Program 2. Pr. 4 behaves just the way you would describe. I like your comment about the fragile d., it seems when writing closures in c++ one needs to constantly think of their true internal rep. which can't be ignored. – Apr 06 '11 at 20:44
1

"Seems to work fine" is *never* adequate evidence that something is not undefined behaviour, because "seems to work fine" is one of the possible behaviours in undefined-behaviour land. As I said before, the lifetime of stack variables *is not* and *cannot* be extended in C++ -- how could it? Just try to call some other function between the calls to accumulator and ac and see your captured x get overwritten with something that does not belong. – JohannesD Apr 06 '11 at 23:14
Your opinion is appreciated and deeply valued by me, believe me. But the only practical way to see who is right is to try different compilers. And trouble is, at this time we do not completely trust them either. But I hope this will be resolved soon anyway. Otherwise there is no evidence to add anything in favor of one or the other view. The real issue that bothers me is that after a certain experience I do not really want to use lambdas in anything more than the usual cosmetic adjustments of the STL code because as you can see, there already appear dark corners in such simple cases. – Apr 07 '11 at 00:53
@Rusty James: The life time of a C++ function argument positively ends when the function returns. Quoting a Javascript article isn't relevant to your argument. Most C++ compilers don't try to _enforce_ this rule, but it's still a rule. It's like speeding: the fact that your car can do 150 doesn't mean that it's allowed, and the police will not always fine you either. – MSalters Apr 07 '11 at 07:45
@Rusty James: This has **nothing** to do with opinions, either! It's very clearly stated *in the language standard*, and I can quote relevant chapter and verse if you so wish. How it is so hard a concept to grasp that the standard **very clearly** defines the rules of the language, and you are very clearly **breaking** one of them? The fact that you can often get away with speeding on a motorway does not change the fact that speeding is illegal. – JohannesD Apr 07 '11 at 08:34

Undefined Behavior with the C++0x Closure: II

Program No. 1: It Works

2. Program No. 2: Closure, Works Fine

Program 3: Program No. 1 with std::function, Works Fine

Program 4: Program No. 2 with std::function, Undefined Behavior

3 Answers3

EDIT

Linked