Is There a Reason Standard Algorithms Take Lambdas by Value?

Question

So I asked a question here: Lambda Works on Latest Visual Studio, but Doesn't Work Elsewhere to which I got the response, that my code was implementation defined since the standard's 25.1 [algorithms.general] 10 says:

Unless otherwise specified, algorithms that take function objects as arguments are permitted to copy those function objects freely. Programmers for whom object identity is important should consider using a wrapper class that points to a noncopied implementation object such as reference_wrapper<T>

I'd just like a reason why this is happening? We're told our whole lives to take objects by reference, why then is the standard taking function objects by value, and even worse in my linked question making copies of those objects? Is there some advantage that I don't understand to doing it this way?

also see: http://stackoverflow.com/questions/34825552/passing-function-objects-into-std-algorithms-by-reference — NathanOliver, Dec 12 '16 at 19:49
[sizeof on a stateless lambda returns 1](http://coliru.stacked-crooked.com/a/ef6ae126ba68338b) — jaggedSpire, Dec 12 '16 at 19:50
What's the cost of copying a lambda that contains no data? After optimisations, nothing at all. After copy elision optimisation, the cost of copying even a complex functor is likely to be zero. — Richard Hodges, Dec 12 '16 at 19:52
*"We're told our whole lives to take objects by reference"* Big objects that is. You won't see me telling people to pass `int`s etc by `const&`. — Baum mit Augen, Dec 12 '16 at 19:53
@BaummitAugen I mean we're told not to copy `string`s by value and that's like what a pointer and a `size_t`? A lambda could be a lot bigger than that depending upon it's capture block. — Jonathan Mee, Dec 12 '16 at 20:00
@RichardHodges If you check the question that spawned this one you'll see that copy elision isn't possible in this case. If you think it would be helpful I can include such an example in the question. — Jonathan Mee, Dec 12 '16 at 20:02
@JonathanMee copying a string results in allocating new memory and copying the contents of the string buffer, unless Small String Optimization or Copy On Write are in effect, though IIRC COW strings are no longer allowed... — jaggedSpire, Dec 12 '16 at 20:02
There are many times when it's advantageous to take something by value. When it's small, when you know you're going to make a copy anyway, etc... The standard's language allows the implementation to do this when it's better than taking reference. — Edward Strange, Dec 12 '16 at 20:03
@JonathanMee *"that's like what a pointer and a `size_t`"* That's not what's getting copied though. What's getting copied is the array the pointer points to, and that's (potentially) big. *"A lambda could be a lot bigger than that"* Yeah, but it usually isn't. The easier inlining on the other hand is a real advantage. — Baum mit Augen, Dec 12 '16 at 20:07
@JonathanMee I think one of the original intentions of predicates is that they are simple, stateless function objects. If the standard does not say this then it probably ought to. IMHO. — Richard Hodges, Dec 12 '16 at 20:10
@BaummitAugen I was thinking of small `string` optimization, but you're right that was a bad example, we're told to copy by reference to prevent the copy of the string in memory. — Jonathan Mee, Dec 12 '16 at 20:21
@RichardHodges But that was the whole motivation of lambdas right, to expand functionality of standard algorithms? Shouldn't have the algorithms at least have come along with? — Jonathan Mee, Dec 12 '16 at 20:22
@CrazyEddie I thought when an object was larger than the size of a pointer it no longer made sense to copy by value. Your statement implies that would still be considered small and should be copped by value. — Jonathan Mee, Dec 12 '16 at 20:25
*"I thought when an object was larger than the size of a pointer it no longer made sense to copy by value."* I don't think that's right. Passing by reference carries more cost than just a pointer copy by change in semantics (namely by introducing potential aliasing). That in turn impacts optimization. — Baum mit Augen, Dec 12 '16 at 20:33
@JonathanMee Assuming that a copied argument indeed results in copied memory (this is not always the case), there is still the argument about which is quicker - a one-off copy of a chunk of memory or continuous repeated indirection through a reference to (possibly constantly changing) data. It's a bit like the discussion about which is faster, a linear vector search or a map. Counterintuitively it's almost always the linear vector search that's faster because there's no pointer indirection and memory is contiguous. — Richard Hodges, Dec 12 '16 at 20:37

Yakk - Adam Nevraumont · Accepted Answer · 2016-12-12T20:41:37.837

std assumes function objects and iterators are free to copy.

std::ref provides a method to turn a function object into a pseudo-reference with a compatible operator() that uses reference instead of value semantics. So nothing of large value is lost.

If you have been taught all your life to take objects by reference, reconsider. Unless there is a good reason otherwise, take objects by value. Reasoning about values is far easier; references are pointers into any state anywhere in your program.

The conventional use of references, as a pointer to a local object which is not referred to by any other active reference in the context where it is used, is not something someone reading your code nor the compiler can presume. If you reason about references this way, they don't add a ridiculous amount of complexity to your code.

But if you reason about them that way, you are going to have bugs when your assumption is violated, and they will be subtle, gross, unexpected, and horrible.

A classic example is the number of operator= that break when this and the argument refer to the same object. But any function that takes two references or pointers of the same type has the same issue.

But even one reference can break your code. Let's look at sort. In pseudo-code:

void sort( Iterator start, Iterator end, Ordering order )

Now, let's make Ordering a reference:

void sort( Iterator start, Iterator end, Ordering const& order )

How about this one?

std::function< void(int, int) > alice;
std::function< void(int, int) > bob;
alice = [&]( int x, int y ) { std:swap(alice, bob); return x<y; };
bob = [&]( int x, int y ) { std:swap(alice, bob); return x>y; };

Now, call sort( begin(vector), end(vector), alice ).

Every time < is called, the referred-to order object swaps meaning. Now this is pretty ridiculous, but when you took Ordering by const&, the optimizer had to take into account that possibility and rule it out on every invokation of your ordering code!

You wouldn't do the above (and in fact this particular implementation is UB as it would violate any reasonable requisites on std::sort); but the compiler has to prove you didn't do something "like that" (change the code in ordering) every time it follows order or invokes it! Which means constantly reloading the state of order, or inlining and proving you did nonesuch insanity.

Doing this when taking by-value is an order of magnitude harder (and basically requires something like std::ref). The optimizer has a function object, it is local, and its state is local. Anything stored within it is local, and the compiler and optimizer know who exactly can modify it legally.

Every function you write taking a const& that ever leaves its "local scope" (say, called a C library function) can not assume the state of the const& remained the same after it got back. It must reload the data from wherever the pointer points to.

Now, I did say pass by value unless there is a good reason. And there are many good reasons; your type is very expensive to move or copy, for example, is a great reason. You are writing data to it. You actually want it to change as you read it each time. Etc.

But the default behavior should be pass-by-value. Only move to references if you have a good reason, because the costs are distributed and hard to pin down.

`your type is very expensive to move or copy, for example, is a great reason` where does one draw the line? how about a big POD struct of other PODs, none of which allocate on heap? — johnbakers, Dec 12 '16 at 20:31
@johnbakers when the users are bitching and profiling has proven beyond all doubt that it's your copy, and not your IO or your inefficient algorithm, or cache exhaustion or... (etc) that's the bottleneck. i.e. almost never. — Richard Hodges, Dec 12 '16 at 20:32
@johnbakers You should extremely rarely pass around big POD structs of other POD structs. This is going to be in a fraction of a fraction of a fraction of your code base; and if your function is depending on the state of that entire POD struct, your code is already screwed. If it isn't, you should not be passing that POD struct. Global data by any other name. So now we are down to a fraction^4 of your code base. With something that happens that rarely (once per mega-line of code), you can afford to profile and discuss and determine if it is worth it. — Yakk - Adam Nevraumont, Dec 12 '16 at 20:39

score 0 · Answer 2 · answered Dec 12 '16 at 20:28

I'm not sure I have an answer for you, but if I have got my object lifetimes correct I think this is portable, safe and adds zero overhead or complexity:

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>


// @pre f must be an r-value reference - i.e. a temporary
template<class F>
auto resist_copies(F &&f) {
    return std::reference_wrapper<F>(f);
};

void removeIntervals(std::vector<double> &values, const std::vector<std::pair<int, int>> &intervals) {
    values.resize(distance(
            begin(values),
            std::remove_if(begin(values), end(values),
                           resist_copies([i = 0U, it = cbegin(intervals), end = cend(intervals)](const auto&) mutable 
    {
        return it != end && ++i > it->first && (i <= it->second || (++it, true));
    }))));
}


int main(int argc, char **args) {
    // Intervals of indices I have to remove from values
    std::vector<std::pair<int, int>> intervals = {{1,  3},
                                                  {7,  9},
                                                  {13, 13}};

    // Vector of arbitrary values.
    std::vector<double> values = {4.2, 6.4, 2.3, 3.4, 9.1, 2.3, 0.6, 1.2, 0.3, 0.4, 6.4, 3.6, 1.4, 2.5, 7.5};
    removeIntervals(values, intervals);
    // intervals should contain 4.2,9.1,2.3,0.6,6.4,3.6,1.4,7.5

    std:
    copy(values.begin(), values.end(), std::ostream_iterator<double>(std::cout, ", "));
    std::cout << '\n';
}

Is There a Reason Standard Algorithms Take Lambdas by Value?

2 Answers2

Linked