0

I have an entity containing several navigation properties and I need to load three of them. The initial Linq query I wrote is:

await _context.EventParticipant
              .Include(x => x.Employee)
              .Include(x => x.ExcuseDescription)
              .Include(x => x.RegistrationMethod)
              .Where(x => x.EventId == eventId)
              .ToListAsync();

This led to massive performance issues, causing the query to seemingly never finish when there were about 300 rows to fetch. The first thing I took a look at was the query generated by LINQ which was this:

SELECT 
    [e].[EmployeeNumber], [e].[EventId], [e].[ExcuseDescriptionId], 
    [e].[IsInvited], [e].[RegistrationMethodId], 
    [v].[employee_number], [v].[company_email], 
    [v].[department_name], [v].[employee_email], 
    [v].[employeecard_rfid], [v].[firstname], [v].[lastname], 
    [v].[status], [v].[mandant], [v].[upn], [v].[userid], 
    [e0].[Id], [e0].[Description], [r].[Id], [r].[Description]
FROM 
    [EventParticipant] AS [e]
INNER JOIN 
    [V_Employee] AS [v] ON [e].[EmployeeNumber] = [v].[employee_number]
LEFT JOIN 
    [ExcuseDescription] AS [e0] ON [e].[ExcuseDescriptionId] = [e0].[Id]
LEFT JOIN 
    [RegistrationMethod] AS [r] ON [e].[RegistrationMethodId] = [r].[Id]
WHERE 
    [e].[EventId] = @__eventId_0

Now, this query runs in SSMS in less than a second, so my next guess was that the problem is caused by entity tracking, but adding AsNoTracking() before materializing the entities led to no improvements.

Interestingly enough, the debug output would show that the query was executed after about 16 seconds (which is still extremely slow), but the entities would still never materialize. The query does load a few extra columns so the next thing I tried was to select only the columns that I need, so I ended up with the following:

await _context.EventParticipant
              .Where(x => x.EventId == eventId).AsNoTracking()
              .Select(participant => new EventParticipantViewModel()
                                     {
                                          EventId = eventId,
                                          EmployeeNumber = participant.EmployeeNumber,
                                          Department = participant.Employee.DepartmentName,
                                          FirstName = participant.Employee.Firstname,
                                          LastName = participant.Employee.Lastname,
                                          IsInvited = participant.IsInvited,
                                          ExcuseDescription = participant.ExcuseDescription.Description ?? null,
                                          RegisterMethod = participant.RegistrationMethod.Description ?? null,
                                      })
              .ToListAsync();

Now, this generated the following query:

SELECT @__eventId_0 AS [EventId], [e].[EmployeeNumber], [v].[department_name] AS [Department], [v].[firstname] AS [FirstName], [v].[lastname] AS [LastName], [e].[IsInvited], COALESCE([e0].[Description], NULL) AS [ExcuseDescription], COALESCE([r].[Description], NULL) AS [RegisterMethod]
  FROM [EventParticipant] AS [e]
  INNER JOIN [V_Employee] AS [v] ON [e].[EmployeeNumber] = [v].[employee_number]
  LEFT JOIN [ExcuseDescription] AS [e0] ON [e].[ExcuseDescriptionId] = [e0].[Id]
  LEFT JOIN [RegistrationMethod] AS [r] ON [e].[RegistrationMethodId] = [r].[Id]
  WHERE [e].[EventId] = @__eventId_0

Although the query itself is not much different than the original one, the performance impact is immense: the query above executes and entities materialize in less than 2 seconds with the same 300-ish rows!

Could anyone please explain why is this happening? What kind of magic does the EF Core do in the background that would impact the performance like this?

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
mobearette
  • 377
  • 10
  • 26
  • 3
    Questions about performance really need the details of the table schema and the indexes/keys on the tables, plus the actual [execution plans](https://www.brentozar.com/pastetheplan). When you tried the query in SSMS how did you supply the `eventId` parameter? Do you have a covering index(es) for the seconds query? – Stu Aug 25 '22 at 20:15
  • It will not take so much time to check my [solution](https://stackoverflow.com/a/73077746/10646316) for solving Parameter Sniffing – Svyatoslav Danyliv Aug 25 '22 at 20:39
  • It's not EF per se, but more likely you just ended up with a different plan. That's why we need to see them. – Charlieface Aug 26 '22 at 01:50
  • I have provided the 'eventId' like this: DECLARE @__eventId_0 uniqueidentifier; SET @__eventId_0 = 'B8B2E4CC-1C5F-4219-9171-2BEC24EF875C'; Here is the execution plan generated by the original query https://www.brentozar.com/pastetheplan/?id=ryUj2181o – mobearette Aug 26 '22 at 06:46
  • And here is the execution plan of the second, performant query: https://www.brentozar.com/pastetheplan/?id=rkE2akLko – mobearette Aug 26 '22 at 06:53
  • `v_Employee` seems to be a view that pulls data from a llinked server or openquery etc. You would appear to have no indexes on `EventParticipant`, what happens if you define a clustered index on `EmployeeNumber` ? – Stu Aug 26 '22 at 08:25
  • I just tried doing that, but the server won't let me define a clustered index on that particular column. But, if the index was the problem, wouldn't I get the same performance issue when executing the query in SSMS, not only with EF? – mobearette Aug 26 '22 at 10:02
  • 1
    Just a nonclustered index on EmployeeNumber include IsInvited? The issue is *usually* one of a cached plan / parameter sniffing; I don't use Linq but you might try experimenting with *option recompile*, this [question](https://stackoverflow.com/questions/2423271/how-to-force-recompile-of-execution-plan-of-a-linq-to-sql-query) shows how. – Stu Aug 26 '22 at 10:16

2 Answers2

1

As @Stu mentioned, it's hard to say for sure not knowing the indexes involved or the amount of child collection records involved, but realize that .Include essentially does a select * (using all columns from the child table) which likely is pulling a different index than your custom projection. The custom projection may be using a covering index and thus increasing your performance.

The other thing you didn't mention is which version of EF (core I'm assuming) you are using. Some versions have generated drastically different queries from the same LINQ expression and some of them could result in better or worse performance. For example, earlier versions of EF core might split this query into multiple lazy loaded expressions or injects order by on the child table Ids which if not included in the indexes could lead to performance issues (particularly with unique identifier Ids).

Jim Wooley
  • 10,169
  • 1
  • 25
  • 43
  • I am using EF Core 6.0.7. Linq expression generates only one query. ReigstrationMethod and ExcuseDescription both have unique identifier IDs. Both of these tables have only 4 rows and consist of just the ID and Description. – mobearette Aug 26 '22 at 06:39
1

In both queries most of the time is spent waiting for your linked server. And plan changes for distributed queries can be very painful. So while you haven't captured the actual execution plan for any really long-running execution, it's probably caused by your linked server.

        <WaitStats>
          <Wait WaitType="OLEDB" WaitTimeMs="939" WaitCount="16" />
        </WaitStats>

First, do you really have to use a linked server? Can't you just copy the data to the local database?

If you do use linked server, try to never join remote tables with local tables. You it's more reliable to load a temp table with data from your linked server and then join that to your local tables.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67
  • Yes, that seems to be a reasonable explanation, though the difference in the execution times when running the same query directly in SSMS and from EF Core is still baffling. Is there any way I can capture the execution plan when the query is run by EF Core? As for the copying of data:; I already suggested that, but our DBA swears there shouldn't be any performance issues with the amount of data that we have. I will try to see if we could simply copy the data from the views into a table, at least to do some tests and see how it would affect the performance. – mobearette Aug 29 '22 at 06:19
  • Turn on Query Store to capture all the plans and runtime stats for the query. https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver16 – David Browne - Microsoft Aug 29 '22 at 13:45