Consider a slightly modified version of Fibonacci to test the performance between (lambda vs function) (with vs without) captures:
size_t fibFn(size_t n) {
if (n <= 1) { return n; }
return fibFn(n - 1) + fibFn(n - 2)+1+2;
// ^~~~
// This is modified so that I can |
// capture something outside |
// this function |
}
When I run this in Quickbench with Clang 10.0, I got a reasonable result:
fnNoCapture < lambdaNoCapture < fn << lambda
When I am just about to conclude that lambda with capture block is extremely slow, however, the result is almost completely inverted when I run this with GCC 10.1:
lambdaNoCapture > fnNoCapture >> fn > lambda
How is this possible? Is it because the two compilers implemented lambda in different ways?
EDIT: Even so, it makes no sense to me that lambda (with capture) can be so much faster than that without capture. The best case a compiler can optimize, in my pov, is to convert a lambda with capture to that without capture (e.g. by inlining variables), if possible.