This was already touched in Why C++ lambda is slower than ordinary function when called multiple times? and C++0x Lambda overhead But I think my example is a bit different from the discussion in the former and contradicts the result in the latter.
On the search for a bottleneck in my code I found a recusive template function that processes a variadic argument list with a given processor function, like copying the value into a buffer.
template <typename T>
void ProcessArguments(std::function<void(const T &)> process)
{}
template <typename T, typename HEAD, typename ... TAIL>
void ProcessArguments(std::function<void(const T &)> process, const HEAD &head, const TAIL &... tail)
{
process(head);
ProcessArguments(process, tail...);
}
I compared the runtime of a program that uses this code together with a lambda function as well as a global function that copies the arguments into a global buffer using a moving pointer:
int buffer[10];
int main(int argc, char **argv)
{
int *p = buffer;
for (unsigned long int i = 0; i < 10E6; ++i)
{
p = buffer;
ProcessArguments<int>([&p](const int &v) { *p++ = v; }, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}
}
compiled with g++ 4.6 and -O3 measuring with the tool time takes more than 6 seconds on my machine while
int buffer[10];
int *p = buffer;
void CopyIntoBuffer(const int &value)
{
*p++ = value;
}
int main(int argc, char **argv)
{
int *p = buffer;
for (unsigned long int i = 0; i < 10E6; ++i)
{
p = buffer;
ProcessArguments<int>(CopyIntoBuffer, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
}
return 0;
}
takes about 1.4 seconds.
I do not get what is going on behind the scenes that explains the time overhead and am wondering if I can change something to make use of lambda functions without paying with runtime.