How fast are lambda functions in GCC

Question

Having toyed around a bit with C++0x Lambda Expression in G++, I was wondering as to how well the performance will be in general/specific situations compared to alternative ways without using lambda functions.

Does anybody know a more or less comprehensive discussion of lambda expression performance or situations in which one should avoid them despite more comfort while developing?

I believe that Stephan T Lavavej said in one of his webisodes that the STL algorithms internally (in MSVC10) don't use lambdas for performance reasons, so I imagine that if you are going to reuse a certain functor frequently, you might be better off writing a separate function or class. — Kerrek SB, Jun 15 '11 at 13:44
@Kerrek, that's probably compiler performance, not runtime performance. Each lambda creates an anonymous type, so using them in STL would explode the number of symbols generated even more. — MSN, Jun 15 '11 at 21:04
@MSN: Quite possibly. I'll have to rewatch and take note of what he says exactly. — Kerrek SB, Jun 15 '11 at 21:06
@Kerrek, I haven't heard what was said, but Standard library in VS2010 doesn't come from MS, it's Dinkumware, so it's not their call to make. Moreover, I think that library supports both C++98 and some C++0x, I think it would be strange to create unnecessary code duplication. I also can't think of any facilities in that library that would benefit from lambdas. — Gene Bushuyev, Jun 15 '11 at 22:56
@Gene: You should check out his series "Advanced STL". It's all about the internals. They license their implementation from Dinkumware, but they're making significant modifications. And VS10 has a totally rewritten implementation that uses C++0x features thoroughly. As I've [discovered elsewhere](http://stackoverflow.com/questions/5952602/using-tr1-libraries-in-gcc-and-msvc/5952703#5952703), MSVC++ has no "dialect" switch in the way GCC has. — Kerrek SB, Jun 15 '11 at 22:58
@Kerrek, @Gene : Indeed, many modifications had to be made just to get the standard library to compile cleanly and behave with `/clr`. — ildjarn, Jun 15 '11 at 23:22

score 29 · Accepted Answer · answered Jun 15 '11 at 13:10

29

If you consider the old-school way of defining a struct with operator () as an alternative way then there isn't gonna be any difference because lambdas are pretty much equivalent to that. Just syntactically more convenient. Let's wait for a more complete answer...

answered Jun 15 '11 at 13:10

Armen Tsirunyan

130,161
59
324
434

9

This pretty much *is* the complete answer. – Puppy Jun 15 '11 at 21:04
@Puppy Idk, g++ can return a plain function pointer from a lambda that has no capture; so it may define an anonymous function somewhere and return a pointer to that rather than a `struct` with `operator()`. – RamblingMad Jul 02 '15 at 04:33

Eli Barzilay · Answer 2 · 2015-07-02T01:19:25.807

When encountering lambda expression for the first time, many people get the vague impression that there's some runtime compilation magic happening to create these functions. Specifically, if you have a function that returns a newly made function as its result, it would seem that the returned function is "created" every time the surrounding function is called. But this is wrong -- a lambda expression (and this is true in any language) contains some code that can be compiled just like any other code, and it all happens statically, without any cost that needs to be left for the runtime.

The only issue is what happens with variables that are closed over, but that does not preclude such compilation -- to create a closure, you just pair up the closure data (these variables) with a pointer to the statically compiled code. The implication of that in terms of performance is that there should be no loss of performance at all -- closed variables or not. Even with closed over variables there is no cost -- any other way to approach whatever problem you're facing would require packaging up those values in some way, so the cost of allocation is the same regardless of how you keep it (explicitly, or implicitly in closed over variables). If an alternative solution doesn't need to package some values, then there wouldn't be any need for closing over them with closures too. This is really the same as with local allocation needed to execute the code -- which would obviously be the same regardless of whether the code comes from a closure with its local scope or from some other scope that would need the same kind of local state.

Again, this is all stuff that holds in any language with closures, and there is no reason for C++ code to suffer from some performance issues where no other language does. One oddity in the C++ lambda expressions is the need to specify which variables you close over, whereas in most other languages you just get everything closed over by default. This would seem like it gives C++ code some edge in having greater control over how much stuff needs to be packages with the closure -- but that's something that is very easy for a compiler to do automatically, without explicit annotations. It leads to one of the most common things that compilers of functional languages do -- "lambda lifting" -- where a function is effectively lifted to the toplevel, avoiding the need to create closures at runtime if they're not needed. For example, if you write (using some JS-like pseudo code):

function foo(x) {
  return function(y) { return y+3; }
}

then it's easy (for a compiler as well as for a human) to see that the returned function does not depend on x, and the compiler can now lift it, as if you wrote:

function generated_name1234(y) { return y+3; }
function foo(x) {
  return generated_name1234;
}

Similar techniques are used when only some values are closed over.

But this is diverging. The bottom line is that lambda expressions are not something that incurs some performance penalty -- closed variables or not.

(As for comparing lambda expressions with using operator(), I'm not sure what most C++ compilers will do, but lambdas should be faster since there is no runtime dispatch that is needed for any method call. Even if lambdas are implemented as anonymous classes with a () operator, the above techniques can apply in that case too, meaning that the dispatch machinery can be compiled away, which would mean that it shouldn't have additional costs too, making it similar to a special case where the anonymous class is trivial to the point of efficient compilation.)

In C++ there is no runtime dispatch for any member function not declared `virtual`, and it is _extremely_ rare to declare `operator()` `virtual` in my experience. Point being, lambdas will very rarely be faster (but never slower) than handwritten functors. — ildjarn, Jun 15 '11 at 21:31
Right, this is putting in concrete terms what I meant when I said that lambda expressions could be a specific case of `()` that can be compiled efficiently. In any case, the real main point is that lambdas should be as fast as any other function. — Eli Barzilay, Jun 15 '11 at 21:47
Did you intend to say that the returned function does not depend on x? It explicitly uses y to calculate its return value. It does not, however, use x, which would make sense as to why we don't need to define the lambda's body in the function scope that returns it. I may be missing something here, but I'm fairly sure you meant to say it doesn't depend on x. — Gurgadurgen, Jul 01 '15 at 15:21

lurscher · Answer 3 · 2011-06-15T17:53:19.490

i don't see any design reason why closures should be lesser performers than equivalent function with the same number and size of passed parameters

even closures capturing all context variables should be able to optimize away only the context variables actually being used in the lambda.

specific context variables (either captured by value or by reference) will need some storage initialised at instantiation time, which happens at the point the lambda is first found during execution. But this storage doesn't need to be heap, stack allocation is perfectly fine.

a lambda is exactly the same as a regular function, the only difference is entirely sintactical; it is defined inside other functions, and can capture some external vars, which are compiled as an additional context parameter. the context parameter might have a POD-like initialization at the point where the lambda is defined.

if a specific compiler (i.e: g++ or clang) behave in conflict with the above, its a warning sign of a bad implementation. clang has the ability to easily extend optimization passed by design, so any such shortcomings should be easier to address in the long run, compared to say, g++

the bottom line is if you don't use context variables, a lambda is (should be) totally indistinguishable from a regular free function to the compiler

score 3 · Answer 4 · answered Dec 16 '14 at 16:49

3

We're starting to avoid using lambdas in some cases (gaming environment) because the closure created (if it has captured values) has an associated new/delete to hold any captured values. While in many environments this is not an issue, we're in the business of shaving off microseconds at a time to get the best performance. Lambdas (those with enclosed variables) featured highly on our profiling and were among the first to go.

answered Dec 16 '14 at 16:49

Ian

433
3
6

*"has an associated new/delete to hold any captured values"* Could you elaborate on that part? I don't quite see why a lambda should use dynamic storage duration / new+delete. (`std::function` typically uses heap allocations, but should have a small object optimization). – dyp Dec 16 '14 at 16:56
the new/aloccation couldn't be from the closure *per se*. Can you add more details? – Martin Ba Dec 16 '14 at 17:02
If the lambda captures any values, it needs somewhere to store them. I believe there may be some implementation specific small area for the oft-used 'this' capture, but much more than that will require a heap allocation at the point of capture. – Ian Feb 18 '15 at 14:44
Was this something to do with capturing larger objects by value? – Dave E May 18 '21 at 09:16

score 2 · Answer 5 · answered Jun 15 '11 at 22:47

Just as others mentioned, there is no reason why compiler generated closures resulting from lambdas should be any different in performance than the hand written ones. And in order to verify it, I just changed the lambdas used in a parser with hand written classes. They all contain only few lines of tight code and executed millions of times, so every change in performance would be immediately noticeable. The result -- exactly the same execution time. So no, there is no difference in performance.

How fast are lambda functions in GCC

5 Answers5