5

I've been using LINQ to SQL & to entities for a while and am overall very happy with them. However i know of their limitations and one in particular is becoming a large issue for me. When you do a complex nested query in the form of

MyContext.SomeTable
.Select(item=>new{
    item.SomeProperty1,
    item.SomeProperty2,
    item.NavigationProperty1
        .Select(nav1=> new {// retrieve some properties}), // This triggers a single query as long as don't have more than one subquery
    item.NavigationProperty2
        .Select(nav2=> new {// retrieve some properties}) // This triggers one query PER ROW in the original query
});

The providers i have tested are LINQ TO SQL / LINQ TO entities (and even worse, devart LINQConnect that fares worse and generates 1 per row on the first navigation property)

What i get now that is generated(pseudocode):

select t1.a,t1.b,t2.c,t2.d from mytable as t1
join navproperty1table as t2

and 1 millions (if there is 1 million results in the first set) of queries like this: select t3.e,t3.f from navproperty2table as t3 where id = X (X changing on X query to next element returned by first query)

What i want:

select t1.a,t1.b,t2.c,t2.d,t3.e,t3.f from mytable as t1 
join navproperty1table as t2
join navproperty2table as t3

Now of course if there were 3 rows in the original table it wouldn't be an issue, but i have 10s of thousands to millions of rows in my tables "and" i need a much much much more complex query in a single select (i want to get a complex graph at once). Think 20 + tables with 3-6 levels of nesting accessing an additional 2-5 tables each.

My SQL server can perfectly cope with it, i don't care for the bandwidth either, it's on an instance linked by a gigabit connection, i can't get that data in deferred manner, i actually "use" all of it immediately so it's not just laziness. Right now for performance reasons i had to split the query in many small queries and join them manually on the LINQ to object size, which gives some really nasty code for whoever maintains it but was the only actual solution i had, so overall including all the small queries and final joining, I'm at over 600 lines of unsplitable code in a single method that is totally unmaintainable.

Are there actually "any" LINQ providers production ready today before i go and evaluated them all that work in such a mindset or am i better off coding and commercializing my own? (I'm very surprised that they don't all work that way actually, i can't see a single instance where you'd be better off with the foreach case and the ones i've tried that claim to get rid of n+1 with loadwith, don't get rid of it as they still do n+1 queries but just batch it in a single call, 1 round trip & n+1 queries isn't satisfying when 1 is 10 000 then 10 000 000 and then 10 000 000 000)

  • (note that I'm speculating on what exactly triggers this, but it isn't the question, no matter what triggers this "exactly" I'm sure to hit it in my current context)

PS: Note that I'm running .NET 4.0 full profile on a windows server 2008 or higher and on SQL server 2008 or higher, a provider that doesn't support anything else would be fine, i have zero requirements for migration, portability, lower .net versions, lower sql server support etc. Migrating to even more recent versions is an option if required. I also don't have any prerequisites for modeling or advanced features, the DB is already there, i only want to query tables, so something with no modeling / views / DML / stored procedure / functions support is fine, my one and only requirement is sensible SQL generation on complex queries and object graphs

EDIT: for clarification here is an actual example of the issue on a DB everyone can get, adventureworks

Querying employees for each contact

Contacts
.Select(cont=>new 
{
    cont.EmailAddress,
    cont.EmailPromotion,
    Employees = cont.Employees
        .Select(emp=>new
        {
            emp.Gender,
            emp.HireDate
        }).ToList()
}).ToList()

Generates

SELECT [t0].[EmailAddress], [t0].[EmailPromotion], [t1].[Gender], [t1].[HireDate], (
SELECT COUNT(*)
FROM [HumanResources].[Employee] AS [t2]
WHERE [t2].[ContactID] = [t0].[ContactID]
) AS [value]

FROM [Person].[Contact] AS [t0] LEFT OUTER JOIN [HumanResources].[Employee] AS [t1] ON [t1].[ContactID] = [t0].[ContactID] ORDER BY [t0].[ContactID], [t1].[EmployeeID]

Now querying just vendors for each Contact Contacts .Select(cont=>new { cont.EmailAddress, cont.EmailPromotion, Vendors = cont.VendorContacts.Select(vend=>new { vend.ContactTypeID, vend.ModifiedDate }).ToList() }).ToList()

still ok:

SELECT [t0].[EmailAddress], [t0].[EmailPromotion], [t1].[ContactTypeID], [t1].[ModifiedDate], (
SELECT COUNT(*)
FROM [Purchasing].[VendorContact] AS [t2]
WHERE [t2].[ContactID] = [t0].[ContactID]
) AS [value]

FROM [Person].[Contact] AS [t0] LEFT OUTER JOIN [Purchasing].[VendorContact] AS [t1] ON [t1].[ContactID] = [t0].[ContactID] ORDER BY [t0].[ContactID], [t1].[VendorID]

Now querying both at once (triggers X row query)

Contacts
.Select(cont=>new 
{
    cont.EmailAddress,
    cont.EmailPromotion,
    Employees = cont.Employees
        .Select(emp=>new
        {
            emp.Gender,
            emp.HireDate
        }).ToList(),
    Vendors = cont.VendorContacts.Select(vend=>new
    {
        vend.ContactTypeID,
        vend.ModifiedDate
    }).ToList()
}).ToList()

Generates the ugly and slow (not pasting it all for obvious reasons but you get the point):

SELECT [t0].[EmailAddress], [t0].[EmailPromotion], [t1].[Gender], [t1].[HireDate], (
SELECT COUNT(*)
FROM [HumanResources].[Employee] AS [t2]
WHERE [t2].[ContactID] = [t0].[ContactID]
) AS [value], [t0].[ContactID]
FROM [Person].[Contact] AS [t0]
LEFT OUTER JOIN [HumanResources].[Employee] AS [t1] ON [t1].[ContactID] = [t0].[ContactID]
ORDER BY [t0].[ContactID], [t1].[EmployeeID]
GO

-- Region Parameters
DECLARE @x1 Int = 1
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

 -- Region Parameters
DECLARE @x1 Int = 2
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 3
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 4
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 5
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 6
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 7
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 8
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 9
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

-- Region Parameters
DECLARE @x1 Int = 10
-- EndRegion
SELECT [t0].[ContactTypeID], [t0].[ModifiedDate]
FROM [Purchasing].[VendorContact] AS [t0]
WHERE [t0].[ContactID] = @x1
GO

What i expect / would like to see generated:

SELECT [t0].[EmailAddress], [t0].[EmailPromotion], [t1].[Gender], [t1].[HireDate], [t2].[ContactTypeID], [t2].[ModifiedDate] ,[t0].[ContactID]
FROM [Person].[Contact] AS [t0]
LEFT OUTER JOIN [HumanResources].[Employee] AS [t1] ON [t1].[ContactID] = [t0].[ContactID]
LEFT OUTER JOIN [Purchasing].[VendorContact] AS [t2] ON [t2].[ContactID] = [t0].[ContactID]
GO
Cœur
  • 37,241
  • 25
  • 195
  • 267
Ronan Thibaudau
  • 3,413
  • 3
  • 29
  • 78
  • Does it actually trigger one query per row, or does it generate a correlated subquery that SQL Server might optimize out to some join-esque execution plan? – ta.speot.is Feb 08 '12 at 10:41
  • It generates a subquery (which is fine) when there is at most 1 nav property, after that it generates an actual query "per" row , which is insane, as in select bla from t1 where id = some actual number not a condition, this line repeated 1 million times – Ronan Thibaudau Feb 08 '12 at 10:46
  • I guess you're looking for a solution that produces a query with 1 join for the first `Select()` on a navigation property, followed by 1 query per subsequent `Select()` (with a WHERE clause on the parent id)? I figure you do not want one large query with an inevitable cartesian product. – Gert Arnold Feb 08 '12 at 11:22
  • Neither, i'm looking for one large single query preferably (but anything sensible is fine), it wouldn't be a full cartesian but simply joins because the navigation properties contain the required FK information.In the bottom example of my post i want for linq to retrieve select whatever from contacts join vendorcontacts join employees, one query, much info, and the providers manages it's own work by producing the output graph in the shape i ask for in the select, just to be clear, a where clause on the parent id is unacetable as there are tens of thouthands of parents, so 10 000 queries! – Ronan Thibaudau Feb 08 '12 at 11:38
  • You may find [this](http://ayende.com/blog/4387/what-happens-behind-the-scenes-nhibernate-linq-to-sql-entity-framework-scenario-analysis) interesting (especially the part about linq-to-sql). Another possibility could be using NHibernate with [ToFuture](http://stackoverflow.com/a/5225939/861716) (even though it is not linq). It is a feature by which multiple queries are executed in one round-trip. – Gert Arnold Feb 08 '12 at 12:25
  • Futures doesn't seem to be what i'm looking for, multiple queries in one roundtrip i already have, what i want is not less rountrips, but less queries (1 million queries in a single roundtrip is still much slower than 1 query at all!), as for the first linq it doesn't discuss my particular issue, EF behaves well (i don't want eager loading), but also fails on multiple navigation properties, his example works (like my linq to SQL example above) because there is 1 level of properties, i need something that will always , consistently and unarguable generates 1 large queries (or a few) without – Ronan Thibaudau Feb 08 '12 at 12:28
  • ever resorting to 1 query per parent mode, no matter what nesting level / query complexity i may have – Ronan Thibaudau Feb 08 '12 at 12:29
  • I understand what you want, but I think it can only be achieved by firing multiple queries. And then the question is: how to reduce the number of queries to the absolute minimum. I don't see how you can avoid cartesian products when doing multiple joins to a central entity (star query). – Gert Arnold Feb 08 '12 at 12:40
  • by doing joins and not cartesians, the expression contains all the relevant information, we know what data we refer to, and what FK to use to go to it, so it should generate multiple joins for multiple levels, i've added an example at the end of my post of what i would expect / like to be generated by a good provider – Ronan Thibaudau Feb 08 '12 at 12:55
  • Sure, that's at least (Employees of contact).Count x (VendorContacts of contact).Count rows. But apparently that's OK. I think the closest you can get then is NHibernate's [Fetch](http://mikehadlow.blogspot.com/2010/08/nhibernate-linq-eager-fetching.html) (no linq). – Gert Arnold Feb 08 '12 at 13:31
  • Aye there is some redundancy (each row contains more information than needed) but this is a non issue perf wise the difference is very small , unlike generating millions of queries. For NHibernate, would that work with deeply nested data?, if it doesn't work with linq, would that mean i need to retrieve the whole table for each sub level or is there some equivalent? I'm still hoping to find a linq provider that provides this off the bat from the dozens that are out there in the wild but from the previous answers my hopes are thinning – Ronan Thibaudau Feb 08 '12 at 13:56
  • With deeply nested data (e.g. ThenFetchMany), I wouldn't be surprised if you'd quickly hit the limits of NHibernate too. Complex querying is always very hard on O/RM tools. NHibernate has always impressed me when it comes to generated SQL (linq-to-nhibernate not yet!). But even with less complex scenarios it sometimes was a hell of a job to prevent the 1 + N problem. Maybe [hql](http://nhforge.org/doc/nh/en/index.html#queryhql-joins) gets you far enough. With linq, I think the best you can do is retrieve the required object graph in the least possible number of queries. – Gert Arnold Feb 08 '12 at 14:34
  • Well with LINQ i can stick to what i did (a very long client side join after the many bits of data get retrieved) , it's not much slower (i'm up to 20ish queries instead of 1), maybe 50% slower, but it do is plan unreadable. As my original question was, is there a nice linq provider that does what i did or should i roll my own, could you copy / paste your comment in an answer and, unless i get a better answer within a few days, i'll mark it as the correct answer assuming it means "roll your own". – Ronan Thibaudau Feb 08 '12 at 14:41
  • Not directly related to my question but since i don't see the whole web whining about a lack of support for deep & complex object graphs & queries, please let me know if you'd be interested in such a solution, i'm considering developing one because it's a very common use case for me and would like to know if it could be a commercial sucess or if i should stick to my own need – Ronan Thibaudau Feb 08 '12 at 14:44
  • I think I will always be suspicious of tools that create complex queries at runtime. The point is testability. There will always be some real-life scenario that makes it fall apart and then fixing is horrible! When things get more complex than regular querying within the capabilities of the O/RM tool we happen to use we tend to resort to views and/or stored procedures. – Gert Arnold Feb 08 '12 at 15:35
  • I'd rather use a well tested and sucessfull library that i can't test , expecting it to be well tested by dedicated people, than rely on my own code to take a huge view and remap it to objects, i'm more likely to hit a typo without noticing than the devs of a tool used by 100 000 persons. But i think i will end up doing a minimalist linq provider without complex operator support (only where & select, first overload only etc) to support my specific scenario at least. – Ronan Thibaudau Feb 08 '12 at 15:46
  • After more searching i'm beggining to regain hope, i have found llblgen that, in adapter mode, does pretty much what i want (i have an example with 12 properties at the root level, 2 being complex operations & 5 being subqueries that themselves have subqueries that end up being lists of lists of item), for all this it generates a single query. However it does choke on a few things so i'd be happy to have a list of all the providers that work that way (fully eager, min amount of queries) to compare them – Ronan Thibaudau Feb 08 '12 at 23:24

4 Answers4

1

A workaround would be to create a view

from your definition

select t1.a,t1.b,t2.c,t2.d,t3.e,t3.f from mytable as t1 join navproperty1table as t2 join navproperty2table as t3

and use linq-2-sql to query that view.

Not sure if I understand your query completely, but uou might just do

from x in MyContext.Sometable
Select new { x.a, x.b, x.t2.c, x.t2.d, x.t3.f } 

and so on.. I cannot test it right now but I am pretty sure yhis will create the select (and only one) you want.

Pleun
  • 8,856
  • 2
  • 30
  • 50
  • Creating a view isn't really an option for 2 reasons: 1) I want to avoid anything to do with SQL, the goal of using LINQ was to get rid of SQL alltogether and to manipulate object graphs and 2) it would return flat objects, while i need a graph, constructing the graph by hand would be even much harder to maintain than my current code (keep in mind we're talking about many nesting levels with many tables each, hell even if everything worked and i could type the LINQ query as it naturally should be the query would easily be 300 lines long) – Ronan Thibaudau Feb 08 '12 at 11:06
  • Split in 2 comments due to size restriction, also i'm not looking for a workaround, my current workaround performs just fine, i'm just looking at making the code maintainable by having it reflect the actual object graph (instead of tons of flat data later joined), and the query example you gave wouldn't work, as you can see from the selects on the navigation properties, T2 and T3 would be collections so no way to do T2.c because there are many items with "c" on T2, which gets us back to nested selects and the bad code generation – Ronan Thibaudau Feb 08 '12 at 11:12
0

I think the closest you can get then is NHibernate's Fetch (no linq).

With deeply nested data (e.g. ThenFetchMany), I wouldn't be surprised if you'd quickly hit the limits of NHibernate too. Complex querying is always very hard on O/RM tools. NHibernate has always impressed me when it comes to generated SQL (linq-to-nhibernate not yet!). But even with less complex scenarios it sometimes was a hell of a job to prevent the 1 + N problem.

Maybe with NHibernate's HQL you can achieve what you want.

With linq, I think the best you can do is retrieve the required object graph in the least possible number of queries.

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
0

You could also do something like this:

var venderContacts= VendorContacts.ToLookup (u =>u.ContactID);
var contracts=Contacts
    .Select(cont=>new 
    {
        cont.EmailAddress,
        cont.EmailPromotion,
        Employees = cont.Employees
            .Select(emp=>new
            {
                emp.Gender,
                emp.HireDate
            }).ToList(),
        Vendors = venderContacts[cont.ContanctID]
    }).ToList();
Arion
  • 31,011
  • 10
  • 70
  • 88
  • I'm already doing something similar, as i said i don't have any issues at all to work right now, i just want to avoid this mess and have a linq provider (that is, change the API, "not" change my code) so that code can be clean. A decent provider should be able to map a graph without workarounds – Ronan Thibaudau Feb 10 '12 at 10:42
0

I've found a provider that seems to handle my core problem (sane generation of SQL vs generating millions of statements for subqueries), not sure if it is a good fit yet as this depends on their answers.

http://www.llblgen.com/tinyforum/Messages.aspx?ThreadID=20658&StartAtMessage=0&#116494

Any other providers i should know of? If i managed to miss this one untill now there may be others and i'd be most happy to compare them. What i have now is

Totally fails my requirement for no 1 query per row issue: - linq to SQL - linq to Entities - devart linqconnect

Seems to work - llblgen

Untested / need feedback - Telerik OpenAccess - NHibernate - Mindscape lightspeed

any others i should know off?

Ronan Thibaudau
  • 3,413
  • 3
  • 29
  • 78