1

I have a big collection where i need to get the newest item based on two properties. The first step is ordering the list based on the date prop. This is all fine and pretty quick.

Then I group the newlist by two properties, and take the first item from each.

var one = Fisks.Where(s=>s.Havn.Id == 1).OrderByDescending(s=>s.Date);
var two = one.GroupBy(s=>new {s.Arter.Name, s.Sort});
var three = two.Select(s=>s.FirstOrDefault());

This works, but it is really slow when using it on the large collection. How can I avoid using the groupBy but still get the same result?

Thanks!

  • try add indexes to your tables in database to increase performance of these kind of operations. – Hamid Pourjam Jun 14 '15 at 17:17
  • I think you are attributing the unacceptable performance to a random element of the query (the GroupBy). Capture the execution plan using Profiler and the Showplan XML event. – usr Jun 14 '15 at 17:19
  • I assume you understand the execution of these is deferred? Are you sure it's `GroupBy` that's slow? – Charles Mager Jun 14 '15 at 17:21
  • @usr I've tried to test my way to where it's slow, and pretty much as I remember from my limited use of SQL queries, GroupBy is really slow. But where can I capture it? Some links? – Henrik Klarup Jun 14 '15 at 17:27
  • 1
    https://www.google.com/webhp?complete=1&hl=en&gws_rd=cr,ssl&ei=#complete=1&hl=en&q=sql+server+Profiler++Showplan+XML+event GroupBy is not slow by itself. Maybe you should make yourself more familiar with query tuning in general. – usr Jun 14 '15 at 17:30
  • if you're working against SQL, try using LINQPad for query optimization. Sorting before grouping is usually a bad idea - even if fixed by LINQs SQL and the SQL Server's optimizer. When grouping with linq, try to have the selection in the Group statement. – mbx Jun 14 '15 at 18:09

2 Answers2

0

Using LINQ only for the first step and then taking the first ones in a loop gives you more control over the process and avoids grouping altogether:

var query = Fisks
    .Where(f => f.Havn.Id == 1)
    .OrderByDescending(f => f.Date)
    .ThenBy(f => f.Arter.Name)
    .ThenBy(f => f.Sort);
var list = new List<Fisk>();
foreach (Fisk fisk in query) {
    if (list.Count == 0) {
        list.Add(fisk);
    } else {
        Fisk last = list[list.Count - 1];
        if (fisk.Sort != last.Sort || fisk.Arter.Name != last.Arter.Name) {
            list.Add(fisk);
        }
    }
}
Olivier Jacot-Descombes
  • 104,806
  • 13
  • 138
  • 188
0

Generally I advise against ordering before doing something that possibly destroys that order (such as GroupBy can do in SQL as generated by LINQ2SQL). Also try ordering only the stuff you are going to use. You can improve your query performance, if you limit the selection only the required fields/properties. You can fiddle around with this sample and use your real backend instead:

var Fisks=new[]{
    new {Havn=new{Id=1},Date=DateTime.MinValue,Arter=new{Name="A"},Sort=1,Title="A1"},
    new {Havn=new{Id=1},Date=DateTime.MinValue.AddDays(1),Arter=new{Name="A"},Sort=1,Title="A2"},
    new {Havn=new{Id=1},Date=DateTime.MinValue,Arter=new{Name="B"},Sort=1,Title="B1",},
    new {Havn=new{Id=1},Date=DateTime.MinValue.AddDays(2),Arter=new{Name="B"},Sort=1,Title="B2",},
    new {Havn=new{Id=1},Date=DateTime.MinValue.AddDays(2),Arter=new{Name="B"},Sort=1,Title="B3",},
};
var stopwatch=Stopwatch.StartNew();
var one = Fisks.Where(s=>s.Havn.Id == 1).OrderByDescending(s=>s.Date);
var two = one.GroupBy(s=>new {s.Arter.Name, s.Sort});
var three = two.Select(s=>s.FirstOrDefault());
var answer=three.ToArray();
stopwatch.Stop();
stopwatch.ElapsedTicks.Dump("elapsed Ticks");
answer.Dump();

stopwatch.Restart();
answer=Fisks
.Where(f=>f.Havn.Id.Equals(1))
.GroupBy(s=>new {s.Arter.Name, s.Sort},(k,g)=>new{
    s=g.OrderByDescending(s=>s.Date).First()//TOP 1 -> quite fast
})
.Select(g=>g.s)
.OrderByDescending(s=>s.Date) // only fully order results
.ToArray();
stopwatch.Stop();
stopwatch.ElapsedTicks.Dump("elapsed Ticks");
answer.Dump();

If you're working against any SQL Server you should check the generated SQL in LINQPad. You don't want to end up with a n+1 Query. Having an index on Havn.Id and Fisks.Date might also help.

Community
  • 1
  • 1
mbx
  • 6,292
  • 6
  • 58
  • 91
  • Is there a cool way of checking the runtime of the query in linqpad? not the time, but the big O. – Henrik Klarup Jun 14 '15 at 19:55
  • The time the query takes to run using the last you provided is five seconds slower than the one I had, so it does not appear to do a difference :) – Henrik Klarup Jun 14 '15 at 19:58
  • You should really really check the generated SQL and tweak the query. You could also copy the SQL and use the SSMS query analyser. – mbx Jun 14 '15 at 20:02
  • You can estimate the big O from the generated SQL - if you get some base query and multiple queries accessing by id, you ran into the n+1 trap. – mbx Jun 14 '15 at 20:03