10

With the following table structure (extraneous columns removed)

create table [Events]
(
    ID int not null identity,
    Name nvarchar(128) not null,
    constraint PK_Events primary key(ID)
)

create table [Donations]
(
    ID int not null identity,
    EventID int not null,
    Amount decimal(10, 2) not null,

    constraint PK_Donations primary key(ID),
    constraint FK_Donations_Events foreign key(EventID) references [Events](ID) on update no action on delete no action
)

I use the following Linq-to-Entities queries:

// 1
ents.Donations.Where(d => d.Amount > 25.0m && d.Event.Name.Contains("Run")).ToList();

// 2
ents.Donations.Include("Event").Where(d => d.Amount > 25.0m).ToList();

// 3
ents.Donations.Include("Event").Where(d => d.Amount > 25.0m && d.Event.Name.Contains("Run")).ToList();

Produces (from an SQL Profiler):

-- 1
SELECT 
[Extent1].[ID] AS [ID], 
[Extent1].[EventID] AS [EventID], 
[Extent1].[Amount] AS [Amount]
FROM  [dbo].[Donations] AS [Extent1]
INNER JOIN [dbo].[Events] AS [Extent2] ON [Extent1].[EventID] = [Extent2].[ID]
WHERE ([Extent1].[Amount] > 25.0) AND ([Extent2].[Name] LIKE N'%Run%')

-- 2
SELECT 
[Extent1].[ID] AS [ID], 
[Extent1].[EventID] AS [EventID], 
[Extent1].[Amount] AS [Amount], 
[Extent2].[ID] AS [ID1], 
[Extent2].[Name] AS [Name]
FROM  [dbo].[Donations] AS [Extent1]
INNER JOIN [dbo].[Events] AS [Extent2] ON [Extent1].[EventID] = [Extent2].[ID]
WHERE [Extent1].[Amount] > 25.0

-- 3
SELECT 
[Extent1].[ID] AS [ID], 
[Extent1].[EventID] AS [EventID], 
[Extent1].[Amount] AS [Amount], 
[Extent3].[ID] AS [ID1], 
[Extent3].[Name] AS [Name]
FROM   [dbo].[Donations] AS [Extent1]
INNER JOIN [dbo].[Events] AS [Extent2] ON [Extent1].[EventID] = [Extent2].[ID]
LEFT OUTER JOIN [dbo].[Events] AS [Extent3] ON [Extent1].[EventID] = [Extent3].[ID]
WHERE ([Extent1].[Amount] > 25.0) AND ([Extent2].[Name] LIKE N'%Run%')

Why in the 3rd query, does it generate a LEFT OUTER JOIN on the Events table a second time? While the query produces correct results, it seems odd, why cannot EF / LINQ re-use [Extent2] in the SELECT and WHERE clause, and why is it a LEFT OUTER JOIN?

I'm using Visual Studio 2010 sp1 .NET 4 and I am connecting to Sql Server 2008 Express.

Matthew
  • 24,703
  • 9
  • 76
  • 110

2 Answers2

7

The left join would be to ensure no rows are missing from the Donations table in the case that a donation points to an event that does not exist. They don't want the Include keyword to have the side effect of causing rows to be missing from the original table so they must be using a left join for safety.

With regards to including the table twice this is probably just a limitation of EF. You mention it twice in your query and it's not smart enough to do the optimisation.

I have to say that if you want to optimise SQL then write SQL, don't bother with EF. What you are doing could be compared to decompiling C# and asking why the assembler doesn't have a certain optimisation. If you use EF then shut your eyes to what SQL it produces :-)

MikeKulls
  • 2,979
  • 2
  • 25
  • 30
  • 1
    The second query uses `Include` yet does not produce a left join. – Matthew Jul 23 '12 at 04:53
  • Very good question, I didn't notice that. This appears to be back to front to me. The second query should have a left join so that Include does not cause side effects and the third query should have inner join as it needs an event row anyway. – MikeKulls Jul 23 '12 at 05:56
  • 1
    A left join would make sense if it were a nullable column, or if I didn't have referential integrity on it. – Matthew Jul 23 '12 at 14:38
  • 1
    Have you seen this question: http://stackoverflow.com/questions/7484249/entity-framework-generating-left-outer-joins-when-it-should-be-doing-inner-joins?rq=1 – MikeKulls Jul 23 '12 at 22:07
  • I have now, this seems to describe it exactly. – Matthew Jul 24 '12 at 02:57
  • @Matthew and MikeKulls: Here are two different problem: (1) The redundant join on the same table is a bug, confirmed by a member of the EF team: http://stackoverflow.com/a/4571952/270591 The "current bits" that are mentioned in that answer and that fix the bug are in EF June 2011 CTP (see James Crowley's comment below the answer). Note that the June CTP haven't been merged with EF 4.x but instead integrated with EF 5. So, most likely the bug is NOT fixed in EF 4.x, but only in EF 5. – Slauma Jul 28 '12 at 11:27
  • (2) The usage of a LEFT OUTER JOIN instead of an INNER JOIN (that also occurs in other situations with multiple `Include`s for example) is probably by design, see this question and answer: http://stackoverflow.com/questions/8527886/entity-framework-4-2-hasrequired-results-in-unexpected-left-outer-join – Slauma Jul 28 '12 at 11:27
  • @Slauma I downloaded the EF5-rc and it produces the exact same results. – Matthew Jul 30 '12 at 05:21
  • @Matthew: Good to know, but too bad... Maybe someone from EF team can say something about this. – Slauma Jul 30 '12 at 08:17
  • @Matthew Why are you so worried about what sql EF produces? I do most of my stuff as sql and only do simple stuff with EF. I've never really looked too closely at what sql it produces and spend most of my efforts on avoiding lazy loading which is the real performance killer. If you don't have a performance issue then ignore these minor problems. – MikeKulls Jul 30 '12 at 21:44
  • I am in a shop that decided to do everything with stored procedures, and it has become a huge maintenance and scalability issue. I figured the query composition functionality of EF would be a good solution to prevent redundant stored procedures. If very simple and obvious queries generate odd SQL then that is a strike against using EF as an alternative. – Matthew Jul 30 '12 at 23:20
  • @Matthew I'm not sure linq is the answer. I think it could make things worse. Nothing is going to be more efficient than writing stored procs so if you are having performance issues they are only going to get worse. For complex stuff I find linq becomes combersome and difficult to optimise. Can't you just find ways to work around the issues you are having? – MikeKulls Jul 31 '12 at 02:08
  • @MikeKulls The problem we have now is that we have almost (or over) 1000 stored procs, and sometimes we have business logic in them, other times in the c-sharp, it really depends on the developer who was coding at the time. The stored procs have non-descriptive names with no real structure, stored procs that do different things depending on the parameters sent to them. The main reason for that is if someone needed different data, they just wrote a new stored proc. I was hoping to leverage the composition features of linq to hopefully increase maintainability / reusability. – Matthew Jul 31 '12 at 02:48
  • @Matthew why not just bring some order to your stored procs? They are more efficient than linq, they are more powerful (especially in sql 2008), easier to optimise and more centralised. Create some standards for naming, commenting etc, give developers some training in sql (I find it is the most misunderstood language at any places I have worked). You are not likely to switch all your procs to linq so if you go linq then you will just add further to the mess imo. – MikeKulls Jul 31 '12 at 03:15
  • @MikeKulls We might be doing a system overhaul, meaning we would be obsoleting the code that calls the stored procs right now. I am just in the stages of finding an alternative data access strategy. – Matthew Jul 31 '12 at 13:55
0

Not direct answer to your question, but an attempt to point you in the right direction after reading your comments for the other answers:

You have everything you need to defend some ORM ussage (incl. EF) - that's all you said about the quantity and quality of the SPs. Any approach will have problems, including pure sql, if that sql is not well written or hard to maintain.

So, if some ORM (EF, etc.) sometimes produces non-efficient code, and this really causes performance problems, this becomes "requirement", and needs to be solved, even using SP.

So, look at your problems from business perspective - you have bad structured and hard to maintain bunch of stored procedures. Probably most of your team is C# and not SQL developers.

Using ORM will increase the maintainability of the code base, as well as will allow for better usage of all team members expertise in C#.

Bad SQL code, produced by ORM on some specific occasions is hardly "no-go" for using the technology, unless proven that it'll create more problems than solving existing ones.

Sunny Milenov
  • 21,990
  • 6
  • 80
  • 106
  • I would look at the other way around. If developers are inexperienced in sql then they need to become experienced. They are writing database apps after all and imo a lack of sql knowledge is unacceptable these days. The one thing they *don't* need is more reasons to avoid sql. – MikeKulls Aug 01 '12 at 22:58
  • They are definitely not inexperienced at SQL, the application has gone through ~9 years of constant development (started with ASP.net 1.0), many different developers. The application as it stands right now has no separation of concerns, stored procs are being called within the code behind directly. This is why I'm looking for something that will be very easy to manage, mostly because I believe the organization of the stored procs right now are a write-off. – Matthew Aug 02 '12 at 02:13