1

I have a fairly simple sales query that can potentially return over 100k rows in any given request. The (simplified) linq query is as follows:

var query = from adi in _repo.AccountingDocumentItems

select new Sales
{
    TotalInclusive = adi.TotalInclusive,
    Employees = adi.Employees.Select(x => x.FirstName + " " + x.LastName)
};

Because a sale can be assigned to multiple employees, and I need this info - I'm pulling a IEnumerable string of names here so that I can later join them with a comma.

The resulting query looks like the below, note the last line where it orders by AccountingDocumentItemId - this only happens when I pull the IEnumerable:

SELECT 
[Project1].[AccountingDocumentItemId] AS [AccountingDocumentItemId], 
[Project1].[C1] AS [C1], 
[Project1].[TotalInclusive] AS [TotalInclusive], 
[Project1].[C3] AS [C2], 
[Project1].[C2] AS [C3]
FROM ( SELECT 
    [Extent1].[AccountingDocumentItemId] AS [AccountingDocumentItemId], 
    [Extent1].[TotalInclusive] AS [TotalInclusive], 
    1 AS [C1], 
    CASE WHEN ([Join1].[AccountingDocumentItemId] IS NULL) THEN CAST(NULL AS varchar(1)) ELSE [Join1].[FirstName] + N' ' + [Join1].[LastName] END AS [C2], 
    CASE WHEN ([Join1].[AccountingDocumentItemId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C3]
    FROM  [dbo].[AccountingDocumentItems] AS [Extent1]
    LEFT OUTER JOIN  (SELECT [Extent2].[AccountingDocumentItemId] AS [AccountingDocumentItemId], [Extent3].[FirstName] AS [FirstName], [Extent3].[LastName] AS [LastName]
        FROM  [dbo].[AccountingDocumentItemEmployees] AS [Extent2]
        INNER JOIN [dbo].[Employees] AS [Extent3] ON [Extent3].[EmployeeId] = [Extent2].[EmployeeId] ) AS [Join1] ON [Extent1].[AccountingDocumentItemId] = [Join1].[AccountingDocumentItemId]
)  AS [Project1]
ORDER BY [Project1].[AccountingDocumentItemId] ASC, [Project1].[C3] ASC

Generally this is fine, it executes quite well but the sort in that query gets worse the more columns I attempt to retrieve. So the issue is that SQL Server is having to perform a very expensive Sort operation, especially when the number of rows starts hitting over 100k.

I'd like to avoid doing this, but I can't see any other way to retrieve the employees per sale. I did try grouping by AccountingDocumentItemId and then grouping in C# itself, but the performance to do this on 100k+ rows is terrible.

How can I restructure this query to avoid the sort, but still return an IEnumerable of employee names?

Gary
  • 742
  • 8
  • 20
  • " I did try grouping by AccountingDocumentItemId..." you say, how did that query look? – Magnus Jun 13 '18 at 09:32
  • Hi Magnus, that one looked a lot better and had no sorting applied. But grouping in memory took about 60s, whereas the original query can handle it all in about 10s. – Gary Jun 13 '18 at 10:56
  • "I'm pulling a IEnumerable string of names here so that I can later join them with a comma." => is there any reason you would NOT want that joining to happen on the server? Have you tried replacing adi.Employees.Select(x => x.FirstName + " " + x.LastName) with an aggregate line and see how it translates that? – tolanj Jun 13 '18 at 11:14

2 Answers2

0

The sort is there to improve performance of materializing the data. That is because the data is returned in format :

TotalInclusiveA, NameA
TotalInclusiveA, NameB
TotalInclusiveA, NameC
TotalInclusiveB, NameD
...

By sorting this list by TotalInclusive, it becomes trivial to group them by totalInclusive and turn the names into list.

You have two options. First option is to change the query so the grouping is one on the client. But you said you tried this and it was slow (obviously if you are pulling 100 thousands of rows).

Second option is to join the names into single string on the client. But this is not trivial. There seems to be option to do it on MS SQL 2017, but older versions might require some hacky implementation. And there might not be way to do it with plain LINQ.

Euphoric
  • 12,645
  • 1
  • 30
  • 44
  • "Second option is to join the names into single string on the client." => presume you mean 'on the server' :) – tolanj Jun 13 '18 at 10:51
  • So based on comments, it sounds like there isn't really anything wrong with my original EF query - and it is faster than joining data in memory - so I guess I am kinda stuck with it. – Gary Jun 13 '18 at 11:00
0

Another option would be to query the db once for each collection and than group join them together in memory. Something like this:

var query1 = _repo.AccountingDocumentItems.Select(x => new {x.TotalInclusive, x.Id}).AsEnumerable();
var query2 = _repo.Employees.Where(x => x.AccountingDocumentItems.Any()).Select(x => new {x.FirstName, x.LastName, x.AccId }).AsEnumerable();

var result = from x in query1
             join y in query2 on x.Id equals y.AccId into g
select new Sales
{
    TotalInclusive = x.TotalInclusive,
    Employees = g.Select(x => x.FirstName + " " + x.LastName)
}

If it has better performance or wore you are going to have to test.

Magnus
  • 45,362
  • 8
  • 80
  • 118
  • I would argue the "join" operation would have same performance hit as doing the sort in database. – Euphoric Jun 13 '18 at 10:29
  • That's what's strange, I did try the above, but the performance hit of joining in memory was really poor. Over 60s, whereas EF manages it in under 10s. – Gary Jun 13 '18 at 10:57
  • @Euphoric I guess that depends on how many columns is in the sort expression. But if this is slower than go with the EF query. If you want to optimize that have a look t the execution plan and introduce indices where appropriate. – Magnus Jun 13 '18 at 11:56