0

I've run into a bit of a performance problem. I got the following database model. Model

I wish to select all entries in Entity2 that doesn't have a foreign key in Entity1, in other words the navigation property must be null.

I've come up with the following LINQ query context.Entity2Set.Where(x => x.Entity1 == null); and it works, however it's slow. So I was wondering if there was a better (faster) way (other then indexing) to select the entries.

In the end I want to select the entries from Entity4, so performance is an issue.

Snæbjørn
  • 10,322
  • 14
  • 65
  • 124
  • 1
    Hovering over the linq query while in debug mode will allow you to see the actual SQL that is going to be generated and executed. Looking at that might give you some insight in what you Linq query is actually doing so as to help with your performance concern. – Chris Feb 16 '13 at 15:22
  • 1
    That should be a fast query. Sounds like you need an index on the foreign key column. – John Saunders Feb 16 '13 at 15:27
  • Why do you not want indexes? – Mathieu Guindon Feb 16 '13 at 17:43
  • How many records do you have in each table? 10's, 100's, 1000's, 10,000's, 100,000's, 1,000,000's, more? – qujck Feb 16 '13 at 17:58
  • @retailcoder It's not that I don't want indexes, I just wanted to know if there was another way to improve the query. So I said it to avoid the default index answer :) – Snæbjørn Feb 17 '13 at 13:09
  • @qujck Entity1-3 would grow over time, so I think it would be safe to assume in the 1m range. Where as Entity4 is almost static so that would be in the 100's – Snæbjørn Feb 17 '13 at 13:11
  • CodeFirst/Model first ? 1:1 Entity1 and Entity2 with foreign key. If so, Then code first we expect Entity1 and Entity2 should share the same primary key. Was a N:1 actaully generated ? Or did you add fluent API such as hasMany? Any way if the ID field in question isnt in an index. Then full table scan is likely outcome. – phil soady Feb 17 '13 at 14:03
  • http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan please add the execution plan to Question. Then the DB freaks can add value :-) – phil soady Feb 17 '13 at 14:04

2 Answers2

2

You are querying for the principal in the relationship (Entity2), not the dependent (Entity1). This means that the foreign key is in table Entity1 and the generated SQL query isn't just a query for a column value in table Entity2. In order to get the desired result the SQL query must join (by LEFT OUTER JOIN) the two tables.

If it were a one-to-many relationship your query is like: "Fetch all orders that have no order items." This can't be achieved by inspecting the Order table alone, because the foreign key is in table OrderItem and the query must join the two tables. Indeed in this example the query's performance could be improved by indexing the foreign key column in table OrderItem.

Unfortunely in your example you can't improve your query by indexing a column anymore because you have a one-to-one relationship (not one-to-many). Since EF's one-to-one relationships are always shared primary key associations the foreign key in table Entity1 is its primary key Id at the same time.

That means that your foreign key column is already indexed - namely by the clustered primary key index in table Entity1.

The generated SQL query is like so:

SELECT 
[Extent1].[Id] AS [Id], 
[Extent1].[OtherColumn] AS [OtherColumn]
FROM   [dbo].[Entity2] AS [Extent1]
LEFT OUTER JOIN [dbo].[Entity1] AS [Extent2] ON [Extent1].[Id] = [Extent2].[Id]
LEFT OUTER JOIN [dbo].[Entity1] AS [Extent3] ON [Extent2].[Id] = [Extent3].[Id]
WHERE [Extent3].[Id] IS NULL

As you can see the tables are joined by the indexed Id columns. The second and redundant LEFT OUTER JOIN is a flaw in Entity Framework's SQL generation that was in effect for EF < 5.0. I am not sure if it is solved in EF >= 5.0. I don't know if it really matters for the query performance because the line joins the table with itself by the same column, so I'd hope the SQL query optimizer will recognize it and ignore the line in the execution plan.

In essence: Your LINQ query is fine and you can't improve the performance, even not by adding any additional index to a column.

However, what I would suggest is to find out what is really slow. Is it indeed the SQL query, or is it translation from the LINQ query to SQL, or is it perhaps object materialization of the returned result set, etc.? The first step I would do is pasting the generated SQL query into SQL Server Management Studio and see how the performance is there. Even though you would not change the LINQ query there might be other opportunities to improve the performance.

Slauma
  • 175,098
  • 59
  • 401
  • 420
0

You've said

without indexes

but for this problem that would be a mistake. An index will find all null references almost instantaneously.

qujck
  • 14,388
  • 4
  • 45
  • 74
  • There are no `NULL` references because the FK is in `Entity1`, not in `Entity2`. – Slauma Feb 17 '13 at 16:25
  • The questions says the navigation property must be null. So there must be nulls. – qujck Feb 17 '13 at 16:55
  • The navigation property used in the LINQ query is not the navigation property that represents the foreign key side, it's the inverse property. (You can conclude it from the `0..1` to `1` in the diagram, i.e. `Entity2` is the principal.) It is `null` if there *doesn't exist* a record in table `Entity1` that refers to table `Entity2`, not when the FK is `NULL`. (In this case, because it's one-to-one relationship where the FK is the PK, it can never be `NULL` anyway.) – Slauma Feb 17 '13 at 17:19