C# Lambda performance issues/possibilities/guidelines

Question

I'm testing performance differences using various lambda expression syntaxes. If I have a simple method:

public IEnumerable<Item> GetItems(int point)
{
    return this.items.Where(i => i.IsApplicableFor(point));
}

then there's some variable lifting going on here related to point parameter because it's a free variable from lambda's perspective. If I would call this method a million times, would it be better to keep it as it is or change it in any way to improve its performance?

What options do I have and which ones are actually feasible? As I understand it is I have to get rid of free variables so compiler won't have to create closure class and instantiate it on every call to this method. This instantiation usually takes significant amount of time compared to non-closure versions.

The thing is I would like to come up with some sort of lambda writing guidelines that would generally work, because it seems I'm wasting some time every time I write a heavily hit lambda expression. I have to manually test it to make sure it will work, because I don't know what rules to follow.

Alternative method

& example console application code

I've also written a different version of the same method that doesn't need any variable lifting (at least I think it doesn't, but you guys who understand this let me know if that's the case):

public IEnumerable<Item> GetItems(int point)
{
    Func<int, Func<Item, bool>> buildPredicate = p => i => i.IsApplicableFor(p);
    return this.items.Where(buildPredicate(point));
}

Check out Gist here. Just create a console application and copy the whole code into Program.cs file inside namespace block. You will see that the second example is much much slower even though it doesn't use free variables.

A contradictory example

The reason why I would like to construct some lambda best usage guidelines is that I've met this problem before and to my surprise that one turned out to be working faster when a predicate builder lambda expression was used.

Now explain that then. I'm completely lost here because it may as well turn out I won't be using lambdas at all when I know I have some heavy use method in my code. But I would like to avoid such situation and get to the bottom of it all.

Edit

Your suggestions don't seem to work

I've tried implementing a custom lookup class that internally works similar to what compiler does with a free variable lambda. But instead of having a closure class I've implemented instance members that simulate a similar scenario. This is the code:

private int Point { get; set; }
private bool IsItemValid(Item item)
{
    return item.IsApplicableFor(this.Point);
}

public IEnumerable<TItem> GetItems(int point)
{
    this.Point = point;
    return this.items.Where(this.IsItemValid);
}

Interestingly enough this works just as slow as the slow version. I don't know why, but it seems to do nothing else than the fast one. It reuses the same functionality because these additional members are part of the same object instance. Anyway. I'm now extremely confused!

I've updated Gist source with this latest addition, so you can test for yourself.

Have you profiled your code and determined that this is where your bottleneck is? +1 anyway for an interesting question :-) — Cameron, Nov 29 '11 at 02:38
This question seems somewhat related: http://stackoverflow.com/questions/1928636/how-do-closures-work-behind-the-scenes-c. I'm not sure how I'd implement your example in a way that's not substantially the same to what the C# compiler does. — millimoose, Nov 29 '11 at 02:41
Bottleneck is here, because that's all I do in my test. I execute this function a million times and there's significant difference compared to the second implementation (see edit above). — Robert Koritnik, Nov 29 '11 at 02:45
The overhead here has more to do with enumerating and calling a delegate than it does with capturing a local value for the delegate implementation to reference. If this is really a bottleneck, the best micro-optimizations actually involve using arrays and integer-based indexing. Personally, I prefer the readability of the extension methods and lambda expressions, as the performance differences don't show up until you've got deeply nested inner loops, at which point you may want to be looking for a better algorithm. — Dan Bryant, Nov 29 '11 at 02:47
@RobertHarvey: Can be as much as 4-5 times... Check Gist and execute it yourself. If times come out too small, increase `IterationCount` constant acordingly so iterative method executes about 1sec per cycle. — Robert Koritnik, Nov 29 '11 at 02:53
@RobertHarvey: Provided example should actually run about 8 times slower... How I wish Jon Skeet was up at this hour, but he probably sleeps (UK time)... — Robert Koritnik, Nov 29 '11 at 03:04
@RobertK [Related to your last comment](http://meta.stackexchange.com/questions/555/why-does-jon-skeet-never-sleep) =) — Josh Darnell, Nov 29 '11 at 03:24
@DanBryant: Have you executed the code that can be copied from Gist and I linked in my question? Is that significant performance difference? The point is this is not about algorithm but about lambda expressions and their optimisation to perform as best as possible... I have my superfast O(1) algorithm alternative to Interval tree... But that's not the point here. — Robert Koritnik, Nov 29 '11 at 03:38
If you look at my answer, you'll see that your benchmark is measuring object creation and GC overhead. The overhead of creating lambdas is a bit more than for iterators (used in methods using `yield`), but in your case the lambdas themselves are faster to execute. — Gabe, Nov 29 '11 at 07:05
You probably meant `Func> buildPredicate = d => i => i.IsApplicableFor(d);`? Note `point` replaced by `d`. — Henrik, Nov 29 '11 at 15:42
@leppie: There was a typo where I used `point` in the lambda instead of `d` that was passed in initially. — Robert Koritnik, Nov 29 '11 at 15:58
@leppie: Can you elaborate a bit more on this? Do you mean that `p` is free from the perspective of the inner lambda (generated one) but it's definitely not free from the builder's lambda perspective. Or there is some total misconception from my side. — Robert Koritnik, Nov 29 '11 at 20:15
@RobertKoritnik: `p` is free in the inner lambda. The rest of it is irrelevant :) — leppie, Nov 30 '11 at 05:03

Jim Mischel · Answer 1 · 2011-11-29T04:29:20.580

What makes you think that the second version doesn't require any variable lifting? You're defining the Func with a Lambda expression, and that's going to require the same bits of compiler trickery that the first version requires.

Furthermore, you're creating a Func that returns a Func, which bends my brain a little bit and will almost certainly require re-evaluation with each call.

I would suggest that you compile this in release mode and then use ILDASM to examine the generated IL. That should give you some insight into what code is generated.

Another test that you should run, which will give you more insight, is to make the predicate call a separate function that uses a variable at class scope. Something like:

private DateTime dayToCompare;
private bool LocalIsDayWithinRange(TItem i)
{
    return i.IsDayWithinRange(dayToCompare);
}

public override IEnumerable<TItem> GetDayData(DateTime day)
{
    dayToCompare = day;
    return this.items.Where(i => LocalIsDayWithinRange(i));
}

That will tell you if hoisting the day variable is actually costing you anything.

Yes, this requires more code and I wouldn't suggest that you use it. As you pointed out in your response to a previous answer that suggested something similar, this creates what amounts to a closure using local variables. The point is that either you or the compiler has to do something like this in order to make things work. Beyond writing the pure iterative solution, there is no magic you can perform that will prevent the compiler from having to do this.

My point here is that "creating the closure" in my case is a simple variable assignment. If this is significantly faster than your version with the Lambda expression, then you know that there is some inefficiency in the code that the compiler creates for the closure.

I'm not sure where you're getting your information about having to eliminate the free variables, and the cost of the closure. Can you give me some references?

I was curious, so I looked at the IL for the simple closure case. It emits a simple class that has a public field 'int point'. The calling method instantiates the 'magic class', sets the point value once and creates a Func delegate. It then calls Where. The delegate body is literally just the contents of the lambda expression, only using the point field on the class. There is nothing that could be done faster manually here (using an outside class) and, in fact, the auto-generated mechanism probably gets a miniscule performance boost from using a public field. — Dan Bryant, Nov 29 '11 at 05:03
Quote: *"I'm not sure where you're getting...*: I've checked compiled source of creating a free variable-less lambda expression where compiler barely creates a public static method instead of closure class and instantiating it which is the case when there're free variables used. Try it out yourself and see. That's the main reason why **I want to convert a lambda with free variables to one with bound ones only**. — Robert Koritnik, Nov 29 '11 at 15:36
I've actually implemented the very same thing, but also eliminating the need for the lambda that you used. method reference is enough. But you know what? Speed is just as slow as with the slow version. Check last version on [Gist](https://gist.github.com/1403144). — Robert Koritnik, Nov 29 '11 at 16:37

Kirk Broadhurst · Answer 2 · 2011-11-29T04:55:27.570

Your second method runs 8 times slower than the first for me. As @DanBryant says in comments, this is to do with constructing and calling the delegate inside the method - not do do with variable lifting.

Your question is confusing as it reads to me like you expected the second sample to be faster than the first. I also read it as the first is somehow unacceptably slow due to 'variable lifting'. The second sample still has a free variable (point) but it adds additional overhead - I don't understand why you'd think it removes the free variable.

As the code you have posted confirms, the first sample above (using a simple inline predicate) performs jsut 10% slower than a simple for loop - from your code:

foreach (TItem item in this.items)
{
    if (item.IsDayWithinRange(day))
    {
        yield return item;
    }
}

So, in summary:

The for loop is the simplest approach and is "best case".
The inline predicate is slightly slower, due to some additional overhead.
Constructing and calling a Func that returns Func within each iteration is significantly slower than either.

I don't think any of this is surprising. The 'guideline' is to use an inline predicate - if it performs poorly, simplify by moving to a straight loop.

I don't understand why `point` in second example is a free variable since it's passed into the lambda as a parameter rather than being used inside of it. That to my opinion is a bound variable. But I'm well aware of being wrong here. They didn't teach us lambdas that long ago when I was a student. That's the main reason I want to understand them better. — Robert Koritnik, Nov 29 '11 at 12:55
I expected second example to be faster yes. Check my edited answer or go directly to [this question](http://stackoverflow.com/q/8214055/75642) where it turned out that a predicate builder lambda made code snappy. I expected the same here but was greatly disappointed. — Robert Koritnik, Nov 29 '11 at 13:04

Gabe · Answer 3 · 2011-11-29T18:17:17.487

I profiled your benchmark for you and determined many things:

First of all, it spends half its time on the line return this.GetDayData(day).ToList(); calling ToList. If you remove that and instead manually iterate over the results, you can measure relative the differences in the methods.

Second, because IterationCount = 1000000 and RangeCount = 1, you are timing the initialization of the different methods rather than the amount of time it takes to execute them. This means your execution profile is dominated by creating the iterators, escaping variable records, and delegates, plus the hundreds of subsequent gen0 garbage collections that result from creating all that garbage.

Third, the "slow" method is really slow on x86, but about as fast as the "fast" method on x64. I believe this is due to how the different JITters create delegates. If you discount the delegate creation from the results, the "fast" and "slow" methods are identical in speed.

Fourth, if you actually invoke the iterators a significant number of times (on my computer, targetting x64, with RangeCount = 8), "slow" is actually faster than "foreach" and "fast" is faster than all of them.

In conclusion, the "lifting" aspect is negligible. Testing on my laptop shows that capturing a variable like you do requires an extra 10ns every time the lambda gets created (not every time it is invoked), and that includes the extra GC overhead. Furthermore, while creating the iterator in your "foreach" method is somewhat faster than creating the lambdas, actually invoking that iterator is slower than invoking the lambdas.

If the few extra nanoseconds required to create delegates is too much for your application, consider caching them. If you require parameters to those delegates (i.e. closures), consider creating your own closure classes such that you can create them once and then just change the properties when you need to reuse their delegates. Here's an example:

public class SuperFastLinqRangeLookup<TItem> : RangeLookupBase<TItem>
    where TItem : RangeItem
{

    public SuperFastLinqRangeLookup(DateTime start, DateTime end, IEnumerable<TItem> items)
        : base(start, end, items)
    {
        // create delegate only once
        predicate = i => i.IsDayWithinRange(day);
    }

    DateTime day;
    Func<TItem, bool> predicate;

    public override IEnumerable<TItem> GetDayData(DateTime day)
    {
        this.day = day; // set captured day to correct value
        return this.items.Where(predicate);
    }
}

Of course lambda becomes very fast when you increase `RangeCount`, because you eliminate the need for creating a lambda several times. All it does afterwards is actually iterating over items. — Robert Koritnik, Nov 29 '11 at 12:58
How do you expect me to cache a lambda? I did put the predicate builder (as in second example) outside of method scope and into the class itself but there was no difference (static or instance variable). I'm also confused because *predicate builder* made [my other very similar code](http://stackoverflow.com/q/8214055/75642) much faster. **I'm now very confused** because what I thought I knew I know now I didn't at all. And still don't. — Robert Koritnik, Nov 29 '11 at 13:07
`ToList()` is called in both cases and is hence not relevant what fraction of time is spent there. It's the same amount of time in both cases. The only difference is the `foreach` loop vs lambda expression with free variable. Because this free variable prevents lambda caching. Compiler instead creates a closure class and instantiates a new instance every time it calls the `GetDayData` method. **That's the slowdown** that's related to code difference. Equal parts are irrelevant. — Robert Koritnik, Nov 29 '11 at 15:49
@RobertKoritnik: I added an example at the end of my post demonstrating how to cache the lambda. It's just as fast as "foreach" when `RangeCount=1` and as fast as all the other lambdas when `RangeCount` is something higher. — Gabe, Nov 29 '11 at 18:44
This is somehow similar to the last class I added to (Gist)[https://gist.github.com/1403144] except that I don't use any lambda expressions at all but rather private implemented class methods. I would like to ask you to look at my `NoLinqRangeLookup` class (I know it should be called `NoLambda...` but nevermind). My class that doesn't use any lambdas is slower than yours that does. That puzzles me lots. I will have to check its compiled source but you may shed some light on this. Hopefully. — Robert Koritnik, Nov 29 '11 at 20:21
The difference between your class and mine is that mine creates a delegate once in its constructor and saves it in a field of the class, while yours creates a new delegate every time. Remember, a lambda is just a delegate, so there's no speed difference between them; the difference is just in the syntax. — Gabe, Nov 29 '11 at 20:33

Tony Tanzillo · Answer 4 · 2012-02-21T07:37:00.293

When a LINQ expression that uses deferred execution executes within the same scope that encloses the free variables it references, the compiler should detect that and not create a closure over the lambda, because it's not needed.

The way to verify that would be by testing it using something like this:

public class Test
{
   public static void ExecuteLambdaInScope()
   {
      // here, the lambda executes only within the scope
      // of the referenced variable 'add'

      var items = Enumerable.Range(0, 100000).ToArray();

      int add = 10;  // free variable referenced from lambda

      Func<int,int> f = x => x + add;

      // measure how long this takes:
      var array = items.Select( f ).ToArray();  
   }

   static Func<int,int> GetExpression()
   {
      int add = 10;
      return x => x + add;  // this needs a closure
   }

   static void ExecuteLambdaOutOfScope()
   {
      // here, the lambda executes outside the scope
      // of the referenced variable 'add'

      Func<int,int> f = GetExpression();

      var items = Enumerable.Range(0, 100000).ToArray();

      // measure how long this takes:
      var array = items.Select( f ).ToArray();  
   }

}

How would the compiler know that the lambda is executed within the scope of `add`? — Oskar Berggren, Mar 02 '13 at 18:38