I'm looking to add some utility methods to help avoid a lot of n+1 issues in a legacy application.
The common pattern is this:
select a.* /* over 10 columns */
from [table-A] a
where /* something */
Retrieved into a collection of ClassA
record instances
Then the sub-instances are lazy retrieved:
select b.* /* over 10 columns */
from [sub-table-B] b
where b.ParentId = @ClassA_ID
This results in an n+1 selects issue. Mostly this isn't a major problem as only a couple of ClassA
instances are being retrieved on an infrequently hit page, but in an increasing number of places this n+1 issue causes the pages to become too slow as the application has scaled.
I'm looking to replace a part of the existing data access code of this application so that the ClassA
instances and ClassB
instances are retrieved together.
I think there are 3 ways this could be done:
1) Get the ClassA
instances as we do now, then get the ClassB
instances in one aggregated call:
select b.*
from [sub-table-B] b
where b.ParentId in ( /* list of parent IDs */ )
This is two separate DB calls, and the query plan of the dynamic SQL will not be cacheable (due to the list of IDs).
2) Get the ClassB
instances with a sub query:
select b.*
from [sub-table-B] b
inner join [table-A] a
on b.ParentId = a.[ID]
where /* something */
This is also two DB calls, and query against [table-A]
has to be evaluated twice.
3) Get all together and de-dupicate the ClassA
instances:
select a.*, b.*
from [table-A] a
left outer join [sub-table-B] b
on a.[ID] = b.ParentId
where /* something */
This is just one DB call, but now we get the contents of [table-A]
repeated - the result set will be larger and the time sending the data from the DB to the client will be more.
So really this is 3 possible compromises:
- 2 DB calls, no query caching
- 2 DB calls, complex query evaluated twice
- 1 DB call, significantly larger result set
I can test these three patterns for any one parent-child pair of tables, but I have loads of them. What I want to know is which pattern is consistently quicker? More importantly why? Is one of these compromises an obvious performance-killer?
What do existing mechanisms like Linq, EF and NHibernate use?
Is there a 4th way that's better than all 3?