0

I have been experimenting trying to get the following Linq working without joy. I'm convinced that it's right, but that might just be my bad Linq. I originally added this as a answer to a similar question here:

Linq-to-entities - Include() method not loading

But as it's a very old question, and mine is more specific, I figured it would do better as an explicit question.

In the linked question, Alex James gives two interesting solutions, however if you try them and check the SQL, it's horrible.

The example I was working on is:

        var theRelease = from release in context.Releases
                         where release.Name == "Hello World"
                         select release;

        var allProductionVersions = from prodVer in context.ProductionVersions
                                    where prodVer.Status == 1
                                    select prodVer;

        var combined = (from release in theRelease
                        join p in allProductionVersions on release.Id equals p.ReleaseID
                        select release).Include(release => release.ProductionVersions);              

        var allProductionsForChosenRelease = combined.ToList();

This follows the simpler of the two examples. Without the include it produces the perfectly respectable sql:

SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name]
    FROM  [dbo].[Releases] AS [Extent1]
    INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
    WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])

But with, OMG:

SELECT 
[Project1].[Id1] AS [Id], 
[Project1].[Id] AS [Id1], 
[Project1].[Name] AS [Name], 
[Project1].[C1] AS [C1], 
[Project1].[Id2] AS [Id2], 
[Project1].[Status] AS [Status], 
[Project1].[ReleaseID] AS [ReleaseID]
FROM ( SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent2].[Id] AS [Id1], 
    [Extent3].[Id] AS [Id2], 
    [Extent3].[Status] AS [Status], 
    [Extent3].[ReleaseID] AS [ReleaseID],
    CASE WHEN ([Extent3].[Id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
    FROM   [dbo].[Releases] AS [Extent1]
    INNER JOIN [dbo].[ProductionVersions] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ReleaseID]
    LEFT OUTER JOIN [dbo].[ProductionVersions] AS [Extent3] ON [Extent1].[Id] = [Extent3].[ReleaseID]
    WHERE ('Hello World' = [Extent1].[Name]) AND (1 = [Extent2].[Status])
)  AS [Project1]
ORDER BY [Project1].[Id1] ASC, [Project1].[Id] ASC, [Project1].[C1] ASC

Total garbage. The key point to note here is the fact that it returns the outer joined version of the table which has not been limited by status=1.

This results in the WRONG data being returned:

Id  Id1 Name        C1  Id2 Status  ReleaseID
2   1   Hello World 1   1   2       1
2   1   Hello World 1   2   1       1

Note that the status of 2 is being returned there, despite our restriction. It simply does not work. If I have gone wrong somewhere, I would be delighted to find out, as this is making a mockery of Linq. I love the idea, but the execution doesn't seem to be usable at the moment.

Out of curiosity, I tried the LinqToSQL dbml rather than the LinqToEntities edmx that produced the mess above:

SELECT [t0].[Id], [t0].[Name], [t2].[Id] AS [Id2], [t2].[Status], [t2].[ReleaseID], (
    SELECT COUNT(*)
    FROM [dbo].[ProductionVersions] AS [t3]
    WHERE [t3].[ReleaseID] = [t0].[Id]
    ) AS [value]
FROM [dbo].[Releases] AS [t0]
INNER JOIN [dbo].[ProductionVersions] AS [t1] ON [t0].[Id] = [t1].[ReleaseID]
LEFT OUTER JOIN [dbo].[ProductionVersions] AS [t2] ON [t2].[ReleaseID] = [t0].[Id]
WHERE ([t0].[Name] = @p0) AND ([t1].[Status] = @p1)
ORDER BY [t0].[Id], [t1].[Id], [t2].[Id]

Slightly more compact - weird count clause, but overall same total FAIL.

Please tell me I've missed something obvious, as I really want to like Linq!

Community
  • 1
  • 1
Kinetic
  • 700
  • 8
  • 15
  • Don't look at the SQL result set. With `Include` it obviously returns more rows than the number of root objects that get materialized in the end. The LINQ statement resulted in 1 `release` object having 2 items in its `ProductionVersions` collection. SQL can only return a flat table, which really is more a SQL language restriction than anything else. After receiving the result set, EF figures out how to materialize object from it. – Gert Arnold Oct 03 '14 at 07:22
  • So far, I've *never* seen incorrect results coming from LINQ to entities queries. The generated SQL can be horrendous, sure. It's improving with each release, but of course it will never compete with hand-crafted SQL. – Gert Arnold Oct 03 '14 at 07:23
  • Unfortunately the materialised objects are equally as wrong (as you would expect with such crazy sql). It's simply the wrong data. As I note in my answer below, I have finally (after probably 8 hours of trying) cracked it and found another way to get the right result. However, unless I made some logic error in my linq, it'seems broken with join+include+where. – Kinetic Oct 03 '14 at 17:24

2 Answers2

0

Okay, after another evening of head scratching I cracked it.

In LinqToSQL:

        using (var context = new TestSQLModelDataContext())
        {
            context.DeferredLoadingEnabled = false;
            DataLoadOptions ds = new DataLoadOptions();                
            ds.LoadWith<ProductionVersion>(prod => prod.Release);
            context.LoadOptions = ds;

            var combined = from release in context.Releases
                             where release.Name == "Hello World"
                             select from prodVer in release.ProductionVersions
                                    where prodVer.Status == 1
                                    select prodVer;

            var allProductionsForChosenRelease = combined.ToList();
        }

This produces the much more reasonable SQL:

SELECT [t2].[Id], [t2].[Status], [t2].[ReleaseID], [t0].[Id] AS [Id2], [t0].[Name], (
    SELECT COUNT(*)
    FROM [dbo].[ProductionVersions] AS [t3]
    WHERE ([t3].[Status] = 1) AND ([t3].[ReleaseID] = [t0].[Id])
    ) AS [value]
FROM [dbo].[Releases] AS [t0]
OUTER APPLY (
    SELECT [t1].[Id], [t1].[Status], [t1].[ReleaseID]
    FROM [dbo].[ProductionVersions] AS [t1]
    WHERE ([t1].[Status] =1) AND ([t1].[ReleaseID] = [t0].[Id])
    ) AS [t2]
WHERE [t0].[Name] = 'Hello World'
ORDER BY [t0].[Id], [t2].[Id]

Which produces the correct results:

Id  Status  ReleaseID   Id2 Name        value
2   1       1           1   Hello World 1

And in LinqToEntities (I couldn't get the Include syntax to work, so I use the quirk where including the desired table in the results links it up correctly):

        using (var context = new TestEntities1())
        {
            var combined = (from release in context.Releases
                            where release.Name == "Hello World"
                            select from prodVer in release.ProductionVersions
                                   where prodVer.Status == 1
                                   select new { prodVer, Release =prodVer.Release });

            var allProductionsForChosenRelease = combined.ToList();
        }

And this produces the SQL:

SELECT 
    [Project1].[Id] AS [Id], 
    [Project1].[C1] AS [C1], 
    [Project1].[Id1] AS [Id1], 
    [Project1].[Status] AS [Status], 
    [Project1].[ReleaseID] AS [ReleaseID], 
    [Project1].[Id2] AS [Id2], 
    [Project1].[Name] AS [Name]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Join1].[Id1] AS [Id1], 
        [Join1].[Status] AS [Status], 
        [Join1].[ReleaseID] AS [ReleaseID], 
        [Join1].[Id2] AS [Id2], 
        [Join1].[Name] AS [Name], 
        CASE WHEN ([Join1].[Id1] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
        FROM  [dbo].[Releases] AS [Extent1]
        LEFT OUTER JOIN  (SELECT [Extent2].[Id] AS [Id1], [Extent2].[Status] AS [Status], [Extent2].[ReleaseID] AS [ReleaseID], [Extent3].[Id] AS [Id2], [Extent3].[Name] AS [Name]
            FROM  [dbo].[ProductionVersions] AS [Extent2]
            INNER JOIN [dbo].[Releases] AS [Extent3] ON [Extent2].[ReleaseID] = [Extent3].[Id] ) AS [Join1] ON ([Extent1].[Id] = [Join1].[ReleaseID]) AND (1 = [Join1].[Status])
        WHERE 'Hello World' = [Extent1].[Name]
    )  AS [Project1]
    ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC

Which is fairly mental, but it does work.

Id  C1  Id1 Status  ReleaseID   Id2 Name
1   1   2   1       1           1   Hello World

All of which leads me to the conclusion that Linq is far from finished. It can be used, but with extreme caution. Use it as a strongly typed and compile time checked, but laborious/error prone, way of writing bad SQL. It's a trade-off. You get more security at the C# end, but man it's a lot harder than writing SQL!

Kinetic
  • 700
  • 8
  • 15
0

Taking a second look, I now understand the elusive effect of the Include.

Just as in plain SQL, a join in LINQ will repeat results when the right side of the join is the "n" end of a 1-n association.

Let's assume you have one Release with two ProductionVersions. Without the Include, the join will give you two identical Releases, because after all the statement selects releases. Now when you add the Include, EF will not only return two releases, but will also fully populate their ProductionVersions collections.

Looking a bit deeper, in the context's cache, it appears that EF really only materialized just 1 Release and 2ProductionVersions. It's just that the releases are returned twice in the final result set.

In a way, you got what you asked for: give me releases, multiplied by their number of versions. But that's not what you intended to ask.

What you (probably) intended reveals a weak spot in EF's toolbox: we can't Include partial collections. I think you tried to get releases populated with ProductionVersions of Status = 1 only. If possible, you'd rather have done this:

context.Releases.Include(r => r.ProductionVersions.Where(v => v.Status == 1))
       .Where(r => r.Name == "Hello World")

But that throws an exception:

The Include path expression must refer to a navigation property defined on the type. Use dotted paths for reference navigation properties and the Select operator for collection navigation properties. Parameter name: path

This "filtered include" problem has been noted before and until the EF team (or a contributor) decides to grab this issue we have to do with elaborate work-arounds. I described a common one here.

Community
  • 1
  • 1
Gert Arnold
  • 105,341
  • 31
  • 202
  • 291