2

I want to sort a list of objects using a value that can take some time to compute. For now I have code like this:

public IEnumerable<Foo> SortFoo(IEnumerable<Foo> original)
{
    return foos.OrderByDescending(foo => CalculateBar(foo));
}

private int CalculateBar(Foo foo)
{
    //some slow process here
}

The problem with the above code is that it will call calculate the value several times for each item, which is not good. The possible optimization is to use cached value (maybe a dictionary), but it will mean that SortFoo will have to clear the cache after each sorting (to avoid memory leak, and I do want the value to be recalculated on each SortFoo call).

Is there a cleaner and more elegant solution to this problem?

Louis Rhys
  • 34,517
  • 56
  • 153
  • 221
  • Memoizing the value would probably save you some time. What about sorting the items on insertion (Using a linked list for example)? – Alex Jul 11 '12 at 10:32
  • You should use `SortFoo` as the last step (if possible) in your query, so that calculation is reduced to a minimum. – Tim Schmelter Jul 11 '12 at 10:37
  • @Alex We don't control the insertion, because we receive the list from the caller code.. Yes I was thinking about memoization (using cached value), but how to make it clean and memoryleak-safe? (see my second paragraph) – Louis Rhys Jul 11 '12 at 10:39
  • @TimSchmelter what do you mean? – Louis Rhys Jul 11 '12 at 10:39
  • @LouisRhys: Don't: `hugeFooList.SortFoo().Where(f => f.SomeBool).Take(5)` but `hugeFooList.Where(f => f.SomeBool).Take(5).SortFoo()`. The order [can affect performance](http://stackoverflow.com/questions/10110013/order-of-linq-extension-methods-does-not-affect-performance). – Tim Schmelter Jul 11 '12 at 10:44
  • 3
    @Louis: The `OrderBy`/`OrderByDescending` methods already have the exact behaviour that you're asking for. `CalculateBar` will only be called *once* for each element of the sequence; the calculated key is then cached and re-used for the duration of that sort. (I'm not sure if this is guaranteed and/or documented anywhere, or whether it's just an implementation detail.) – LukeH Jul 12 '12 at 02:32

2 Answers2

7

It appears that .OrderBy() is already optimized for slow keySelectors.

Based on the following, .OrderBy() seems to cache the result of the keySelector delegate you supply it.

var random = new Random(0);
var ordered = Enumerable
    .Range(0, 10)
    .OrderBy(x => {
        var result = random.Next(20);
        Console.WriteLine("keySelector({0}) => {1}", x, result);
        return result;
    });
Console.WriteLine(String.Join(", ", ordered));

Here's the output:

keySelector(0) => 14
keySelector(1) => 16
keySelector(2) => 15
keySelector(3) => 11
keySelector(4) => 4
keySelector(5) => 11
keySelector(6) => 18
keySelector(7) => 8
keySelector(8) => 19
keySelector(9) => 5
4, 9, 7, 3, 5, 0, 2, 1, 6, 8

If it were running the delegate once per comparison, I'd see more than just one invocation of my keySelector delegate per item.

ESV
  • 7,620
  • 4
  • 39
  • 29
3

Because each item is compared against other items multiple times in a sort, you can cheaply cache the computation at least one-per-item. If you're often running the calculation against the same values, Memoizing the function would be your best bet,

public IEnumerable<Foo> SortFoo(IEnumerable<Foo> original)
{
    return foos
        .Select(f => new { Foo = f, SortBy = CalculateBar(f) })
        .OrderByDescending(f=> f.SortBy)
        .Select(f => f.Foo);
}

This will reduce the calculations to once per item

Dave Bish
  • 19,263
  • 7
  • 46
  • 63
  • Can you explain a bit what you're doing with `new { Foo = f, SortBy = CalculateBar(f) }` ? Anonymous type is a bit new to me. – Louis Rhys Jul 11 '12 at 12:21
  • 1
    Basically, it's making a new anonymous object, to temporarily hold the computed value, and a pointer to the initial object. This means the computed value will only be calculated once - as all sorting requires an element it compared to multiple times. Once the sorting is done, this temp object is throw away, and the original object returned. – Dave Bish Jul 11 '12 at 13:29
  • See other answer this would be actively detrimental since LINQ the does the same thing, so you're duplicating object creation. – Brondahl Jul 31 '23 at 14:51