EF joining 5 large tables - Include() method optimization

Question

I have the following database structure (EDMX models):

public class Company 
{
    public Guid Id { get; set; }
    public virtual ICollection<Location> Locations { get; set; }
    // ... 15 more columns
}

public class Location 
{ 
    public Guid Id { get; set; }
    public virtual Company Company { get; set; }
    public Guid CompanyId { get; set; }

    public virtual ICollection<ReportA> ReportsA { get; set; }
    public virtual ICollection<ReportB> ReportsB { get; set; }
    public virtual ICollection<ReportC> ReportsC { get; set; }
    // ... 15 more columns with information - name, description etc.
}

public class ReportA
{
    public virtual Location Location { get; set; }
    public Guid LocationId { get; set; }
    // 30 more columns of type "int?"
}

public class ReportB
{
    public virtual Location Location { get; set; }
    public Guid LocationId { get; set; }
    // 30 more columns of type "int?"
}

public class ReportC
{
    public virtual Location Location { get; set; }
    public Guid LocationId { get; set; }
    // 30 more columns of type "int?"
}

One company can have many locations. Each location have many reports from A, B and C.
The columns from ReportA, ReportB, ReportC tables are different.
Each one of the Report tables have аpproximately 40 000 rows.
The Company and Location tables have ~ 5000 rows

I need to get all of the data and make a summary report.
The code is:

 using (ComapnyEntities dataBaseContext = new ComapnyEntities())
 {
      IQueryable<Locations> query = dataBaseContext.Locations
            .AsNoTracking()
            .Where(location => companyIds.Contains(location.CompanyId))
            .Include(location => location.Company)
            .Include(location => location.ReportsA)
            .Include(location => location.ReportsB)
            .Include(location => location.ReportsC);
       // more filtation 
       return query.ToList();

       // assume that companyIds have all company Ids 
}

In most cases I can use Skip() and Take() methods which makes execution faster (i.e. .Take(10)), but in one special case I need to pull all locations along with company information, ReportsA, ReportsB, ReportsC.

based on comment in Stackoverflow, Include() makes Cartesian product by rows. https://stackoverflow.com/a/22625208/6142097

 // so if that is true:
 4 000 * 4 000 * 40 000 * 40 000 * 40 000 = ?? (1.024e+21)

The "good part" is that the page takes 15 sec to load (locally), but in production is not the same and takes longer.

So is there a way to improve the performance of the query?

I Tried with Entity-framework code is slow when using Include() many times - it's slower
I have added Non-Clustured index of Location.CompanyId, ReportA.LocationId, ReportB.LocationId, ReportC.LocationId, LocationId - no difference (I am not sure indexes are correctly added)
Cannot construct native SQL query, that joins all the tables, which can be executed for less than 10 sec.
I Tried to replace .Inlude() with .IncludeOptimized() but no real difference.

My machine is using SQL Server 2014 with windows 8.1

What are your suggestions? Is there a way to improve my code?

Interesting question, I was in the same place, I tried many different ways, Lazy Loading was slower, explicit loading as well, so I used eager loading same like you, but some times lazy loading is better, can you add your ComapnyEntities configuration, and the place you are using the result list (maybe you don't need all of these data)? maybe a snapshot of your sql profiler also would help. — Munzer, May 28 '17 at 13:43
I am using the result in different cases. `First case` I make Sum by columns where I need to show in the client side 90 values from all Reports. Example Sum of all ReportsA.Column1, then sum of all ReportsA.Column2 ... ReportsC.Column33. `Second case` I am extracting reports by Location. `Third case` by Record Month (Forgot to say that each Report has columns Years And Month). The calculation are executed on the server Not a problem. So I need all of these data. What exactly you need from `ComapnyEntities configuration`? It's Not Code First. — mihkov, May 28 '17 at 14:10
if you are looking for sum, I would go with lazy loading, and get the sum only instead of getting all the rows, you will notice a difference, so you remove all these includes, and when you want a sum you do this `Location.ReportsA.Sum(r => r.Colomn1)`, and then ef will do another sql statement returning the sum only when you are using it, you should [enable](http://www.entityframeworktutorial.net/EntityFramework4.3/lazy-loading-with-dbcontext.aspx) lazy loading first. — Munzer, May 28 '17 at 14:16
I cannot execute Sum() from code in a query , because it returns incorrect result for me. Example if there are no rows after filtration i.e. Empty collection it returns `0` but I need NULL. It was strange case, but finally decided to make calculation on the server. — mihkov, May 28 '17 at 14:32
you can always manipulate your result, ex `var decsum = Location.ReportsA.Sum(r => r.Colomn1) == 0 ? null : Location.ReportsA.Sum(r => r.Colomn1); `, dont just try to solve the issue, you should work around the best solution around your project. I hope I helped :) — Munzer, May 28 '17 at 14:34
As for the SQL Profiler. I am using it but when the SQL query is executed and I click on it to see the code, it takes about 5 mins to load the SQL code and sometimes "Program not responding ..." So I will try to provide the executed query — mihkov, May 28 '17 at 14:39

score 4 · Answer 1 · answered May 28 '17 at 13:42

4

I need to get all of the data and make a summary report.

You shouldn't "get all of the data" to "make a summary report". Instead write queries that return summaries, not every row.

answered May 28 '17 at 13:42

David Browne - Microsoft

80,331
6
39
67

I would add that you could, if you can/want too, create a Stored Procedure in SQL to gather all the data you need from the DB Tables and return them in single rows. You can then call the SP from EF. – Jason H May 28 '17 at 13:48
Maybe I need to try with Stored Procedures for each of my cases. So I have another question before start with this - will SP return updated data if one of the Reports has been modified. @David Browne by "queries that return summaries" you mean something like Stored procedure? – mihkov May 28 '17 at 14:18
"queries that return summaries" could be written in C# with LINQ, or in TSQL. – David Browne - Microsoft May 28 '17 at 14:22
Can you give me an example? – mihkov May 28 '17 at 14:34

EF joining 5 large tables - Include() method optimization

1 Answers1