I would like some information on how to correctly think about C++11 closures and std::function
in terms of how they are implemented and how memory is handled.
Although I don't believe in premature optimisation, I do have a habit of carefully considering the performance impact of my choices while writing new code. I also do a fair amount of real-time programming, e.g. on microcontrollers and for audio systems, where non-deterministic memory allocation/deallocation pauses are to be avoided.
Therefore I'd like to develop a better understanding of when to use or not use C++ lambdas.
My current understanding is that a lambda with no captured closure is exactly like a C callback. However, when the environment is captured either by value or by reference, an anonymous object is created on the stack. When a value-closure must be returned from a function, one wraps it in std::function
. What happens to the closure memory in this case? Is it copied from the stack to the heap? Is it freed whenever the std::function
is freed, i.e., is it reference-counted like a std::shared_ptr
?
I imagine that in a real-time system I could set up a chain of lambda functions, passing B as a continuation argument to A, so that a processing pipeline A->B
is created. In this case, the A and B closures would be allocated once. Although I'm not sure whether these would be allocated on the stack or the heap. However in general this seems safe to use in a real-time system. On the other hand if B constructs some lambda function C, which it returns, then the memory for C would be allocated and deallocated repeatedly, which would not be acceptable for real-time usage.
In pseudo-code, a DSP loop, which I think is going to be real-time safe. I want to perform processing block A and then B, where A calls its argument. Both these functions return std::function
objects, so f
will be a std::function
object, where its environment is stored on the heap:
auto f = A(B); // A returns a function which calls B
// Memory for the function returned by A is on the heap?
// Note that A and B may maintain a state
// via mutable value-closure!
for (t=0; t<1000; t++) {
y = f(t)
}
And one which I think might be bad to use in real-time code:
for (t=0; t<1000; t++) {
y = A(B)(t);
}
And one where I think stack memory is likely used for the closure:
freq = 220;
A = 2;
for (t=0; t<1000; t++) {
y = [=](int t){ return sin(t*freq)*A; }
}
In the latter case the closure is constructed at each iteration of the loop, but unlike the previous example it is cheap because it is just like a function call, no heap allocations are made. Moreover, I wonder if a compiler could "lift" the closure and make inlining optimisations.
Is this correct? Thank you.