4

I made a function to recursively find the first or default item that fit a condition (first code block).

Resharper suggested me to change few lines in only one LINQ line (second code block).

I was wondering if the Resharper suggestion would give me same performance and same memory footprint. I made a test for the performance (3rd code block). The result is just what I expected. Why the difference is so large ?

8156 milliseconds
Laure
23567 milliseconds
Laure LINQ

Where does come from that difference ??? Why results are not the same?... or at least, closer ?

public static T RecursiveFirstOrDefault<T>(this T item, Func<T, IEnumerable<T>> childrenSelector, Predicate<T> condition)
    where T : class // Hierarchy implies class. Don't need to play with "default()" here.
{
    if (item == null)
    {
        return null;
    }

    if (condition(item))
    {
        return item;
    }

    foreach (T child in childrenSelector(item))
    {
        T result = child.RecursiveFirstOrDefault(childrenSelector, condition);
        if (result != null)
        {
            return result;
        }
    }

    return null;
}

But Resharper suggested me to convert the foreach block to a LINQ query as follow:

public static T RecursiveFirstOrDefaultLinq<T>(this T item, Func<T, IEnumerable<T>> childrenSelector, Predicate<T> condition)
    where T : class // Hierarchy implies class. Don't need to play with "default()" here.
{
    if (item == null)
    {
        return null;
    }

    if (condition(item))
    {
        return item;
    }

    // Resharper change:
    return childrenSelector(item).Select(child => child.RecursiveFirstOrDefaultLinq(childrenSelector, condition)).FirstOrDefault(result => result != null);
}

Test:

private void ButtonTest_OnClick(object sender, RoutedEventArgs e)
{
    VariationSet varSetResult;
    Stopwatch watch = new Stopwatch();

    varSetResult = null;
    watch.Start();
    for(int n = 0; n < 10000000; n++)
    {
        varSetResult = Model.VariationRef.VariationSet.RecursiveFirstOrDefault((varSet) => varSet.VariationSets,
            (varSet) => varSet.Name.Contains("Laure"));
    }
    watch.Stop();
    Console.WriteLine(watch.ElapsedMilliseconds.ToString() + " milliseconds");
    Console.WriteLine(varSetResult.Name);

    watch.Reset();

    varSetResult = null;
    watch.Start();
    for(int n = 0; n < 10000000; n++)
    {
        varSetResult = Model.VariationRef.VariationSet.RecursiveFirstOrDefaultLinq((varSet) => varSet.VariationSets,
            (varSet) => varSet.Name.Contains("Laure"));
    }
    watch.Stop();
    Console.WriteLine(watch.ElapsedMilliseconds.ToString() + " milliseconds");
    Console.WriteLine(varSetResult.Name + " LINQ");

}

I have to go for today... Hope to answer properly about tests: x86, Release on a 12 core machine, windows 7, Framework.Net 4.5,

My conclusion:

In my case it is ~3x faster in non linq version. Readability is better in LINQ but who care WHEN it is in a library where you should only remember what it does and how to call it (in this case - not a general absolute case). LINQ is almost always slower than good coded method. I would personnaly flavor:

  • LINQ: Where performance is not really an issue (most cases) in specific project code
  • Non Linq: Where performance is an issue in specific project code, and where used in a library and where code should be stable and fixed where usage of the method should be well documented and we should not really need to dig inside.
Eric Ouellet
  • 10,996
  • 11
  • 84
  • 119
  • The underlying compiler compiles linq code dramatically different from foreach code, that's where your difference comes from. – C Bauer Sep 03 '14 at 18:05
  • 7
    Don't you have to reset the timer to get the correct count for the second test iteration? – entropic Sep 03 '14 at 18:05
  • @entropic, now better... I have better performance in non-linq which is what I was expected. I'm modifiying my question... – Eric Ouellet Sep 03 '14 at 18:08
  • IEnumerable.FirstOrDefault can take a predicate. For instance, `"abcd".FirstOrDefault(f => f > 'b')` returns 'c'. – user2023861 Sep 03 '14 at 18:10
  • Can you label your tests with the outputs and methods you are running? – jamesSampica Sep 03 '14 at 18:19
  • 1
    See also: http://stackoverflow.com/questions/14893924/for-vs-linq-performance-vs-future – Magnus Sep 03 '14 at 18:21
  • 2
    Rather than try to mix a whole bunch of operations together you should maintain a separation of concerns. Create a method to traverse a tree-based structure and flatten it, you already have a method to filter a sequence to items matching a condition, and you already have a method to get the first item in a sequence or a default value. You can then compose those three operations if you *really* need to, although it's not exactly adding a whole lot once you've built each building block. – Servy Sep 03 '14 at 18:27
  • @Magnus, Thanks a lot. It helped me. In fact, if I would not had made my initial mistake of forget to reset my StopWatch in between, I would probably not post that question. – Eric Ouellet Sep 04 '14 at 13:22
  • @Servy, Nice advise. Not sure how easy is to do it properly in order to not create a new collection for that ,which would take a lots of additional memory. By using yield, it would probably be possible. I will think about it. I like your idea. – Eric Ouellet Sep 04 '14 at 13:26
  • Microsoft's Ix-Main package has an `Expand` method which recursively flattens a sequence like this. Highly recommend Ix. – Cory Nelson Sep 04 '14 at 14:06

1 Answers1

3

Here are some reasons there's a difference between non-LINQ and LINQ code performance:

  1. Every call to a method has some performance overhead. Information has to be pushed onto the stack, the CPU has to jump to a different instruction line, etc. In the LINQ version you are calling into Select and FirstOrDefault, which you are not doing in the non-LINQ version.
  2. There is overhead, both time and memory, when you create a Func<> to pass into the Select method. The memory overhead, when multiplied many times as you do in your benchmark, can lead to the need to run the garbage collector more often, which can be slow.
  3. The Select LINQ method you are calling produces an object that represents its return value. This also adds a little memory consumption.

why is the difference so large?

It's actually not that large. True, LINQ takes 50% longer, but honestly you're talking about only being able to complete this entire recursive operation 400 times in a millisecond. That's not slow, and you're unlikely to ever notice the difference unless this is an operation you're doing all the time in a high-performance application like a video game.

StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315
  • First... Thanks a lot :-). About "1.", I agree it has impact, I suspect that it is not the big part of the time. About "2", I'm not sure to understand because as I see it, it is the same for both cases. About "3", I think that you probably put the finger on it because it need to create a new object which in this case is significant in regards to the whole code, also considering that each of (in my test case) hierarchy object as a very fast condition to verify. Thanks a lots! – Eric Ouellet Sep 04 '14 at 13:37
  • I made another correction that nobody really saw and was a huge bug... In my non linq version, I was calling the Linq version. Results are updated also and now it is almost 3 times faster when non using LINQ which is the same as what I red in Magnus link in comments of my question. – Eric Ouellet Sep 04 '14 at 13:51
  • @EricOuellet: Regarding #2: Every time you call `.Select()`, you are passing in a delegate (`child => child.RecursiveFirstOrDefaultLinq(childrenSelector, condition)`), which closes over two variables. Even though you never say "`new`", that delegate is instantiated automatically, basically the same as if you had a class with two fields on it. So the creation of an object, plus the necessity of garbage-collecting that object later, will have a performance impact. – StriplingWarrior Sep 04 '14 at 16:02
  • you are totally right. I missed that. It should then have a very big impact. Thanks you very much for having pointed me out that fact :-) ! – Eric Ouellet Sep 04 '14 at 17:43