4

I have a LINQ query that I am having trouble optimizing and takes about 5.5 seconds to run. I am using a view called StaffingResourceData and a table called StaffingForecasts.

Each StaffingResource has a ResourceId, a Division, and a Type. A StaffingForecast has a ResourceId, Project, Date (represents a Monday of a week), Hours. A StaffingResource can have 0-many StaffingForecasts.

For each StaffingResource, I need a list of their total forecasted hours for the next 12 weeks. Here is what I have right now:

// Get list of dates
var dates = new List<DateTime>();
var start = Utilities.GetStartOfWeek(DateTime.Today);
for (var i = 0; i < 12; i++)
{
    dates.Add(start.AddDays(i * 7));
}
var end = dates[11];

// Get resources
var resources = (from r in context.StaffingResourceDatas
                 where r.EmployeeId != null
                     && !exclusionList.Contains(r.ResourceTitleId)
                 join f in context.StaffingForecasts.Where(x => x.Date >= start && x.Date <= end) on r.ResourceId equals f.ResourceId into g1
                 from f in g1.DefaultIfEmpty()
                 group new { f.Date, f.Hours } by r into g2
                 select new ChartResourceModel
                 {
                     ResourceId = g2.Key.ResourceId,
                     Division = g2.Key.ResourceDivision,
                     Type = g2.Key.ResourceType,
                     Dates = dates.Select(d => new ChartDateModel
                     {
                         Date = d,
                         Available = (g2.Where(f => f.Date == d).Any() ? g2.Where(f => f.Date == d).Sum(f => f.Hours) : 0) < 24
                     }).ToList()
                 })
               .ToList();

Any ideas on how I could speed this up?

arazzy
  • 495
  • 1
  • 8
  • 23
  • 4
    Would be better in [CodeReview.SE]? – Soner Gönül Nov 03 '15 at 11:31
  • Well, without actually answering your question you may substitute this `Where(f => f.Date == d).Any()` by the overload for `Any` expexting a `Func`: `Any(f => f.Date == d)` – MakePeaceGreatAgain Nov 03 '15 at 11:34
  • 2
    Take a look at the SQL that is being generated, that will give you a start at least. – DavidG Nov 03 '15 at 11:34
  • The part `group new { f.Date, f.Hours } by r into g2` looks suspicious, I've never seen something like this and have no idea what it does. – Ivan Stoev Nov 03 '15 at 11:56
  • 1
    Complex reporting queries should be done in a view, not in LINQ. Joining in LINQ means there's a relation missing in the model. Add the relation, remove the join and let EF generate the proper SQL. Also remove the inner `ToList()` calls, they result in a separate query performed for each date. Your code may result in a 1000 queries instead of 1. – Panagiotis Kanavos Nov 03 '15 at 12:06

5 Answers5

9
  1. Avoid using Contains. It degrades performance heavily. See this post

  2. ToList() is a command to execute your query. Till you call ToList() method, linq query is not started as linq has a feature called deferred execution. So if you call ToList(), you start some real operations with Databaseof files.

  3. reducing columns of table reduces bandwidth required(delete unnecessary columns from your query)
  4. turn off change-tracking and identity-management (for example, ObjectTrackingEnabled in LINQ-to-SQL)

    using (YourDataContext dataContext = new YourDataContext())    
    {
        dataContext.ObjectTrackingEnabled = false;    
        //Your code
    }
    
  5. Use one of the tuning options of EF such as .AsNoTracking(). The extensive description can be seen here.
  6. use a pre-compiled query. It sometimes reduces pre-processing overheads
StepUp
  • 36,391
  • 15
  • 88
  • 148
  • I had a similar problem with performance and also a contains.....One thing was to avoid it OR at least use all possible and parts of the if before the contains itself to reduce how often contains is called. – Thomas Nov 03 '15 at 12:29
  • I know this is old, but #1 was out of date when you posted it. That problem was addressed in EF6. – Ed S. Oct 22 '19 at 17:08
1
  • Avoid .ToList(), this always creates a sql query when expanding or using related entities
  • CompiledQueries compiles the queries in advance, avoiding some overhead ( not much)
  • Debug your queries ( eg. using http://miniprofiler.com/ )
NicoJuicy
  • 3,435
  • 4
  • 40
  • 66
1

After playing around with it for awhile, I was able to get the loading time down from 5.5 seconds to 1.5 seconds. Here is what I came up with:

// Get resources
var resources = (from r in
                    (from r in context.StaffingResourceDatas
                     where r.EmployeeId != null
                         && !exclusionList.Contains(r.ResourceTitleId)
                     join f in context.StaffingForecasts on r.ResourceId equals f.ResourceId
                     group f by r into g
                     select new
                     {
                         Resource = g.Key,
                         Forecasts = g.Where(f => f.Date >= start && f.Date <= end && f.StaffingPotentialProject == null).ToList()
                     }).ToList()
                 group r.Forecasts by r.Resource into g
                 select new ChartResourceModel
                 {
                     ResourceId = g.Key.ReportsToId,
                     Division = g.Key.ResourceDivision,
                     Type = g.Key.ResourceType,
                     Dates = dates.Select(d => new ChartDateModel
                     {
                         Date = d,
                         Available = (g.SelectMany(f => f.Where(x => x.Date == d)).Sum(x => x.Hours)) < 24
                     }).ToList()
                 }).ToList();

It seems like the best way is to just get all the data you need and calling .ToList() without trying to do anything fancy and then performing any extra operations on that data.

arazzy
  • 495
  • 1
  • 8
  • 23
0

For start, try to avoid ".ToList()" unless everything is done, because when you fire ".ToList()", result is being materialized, which you don't want if there is a lot of data and you want to do more query operations on that data.

So, try with IQueryable properties, so you can at least get data much faster and then do some operation on it.

And of course, check what is with query/queries you are sending to SQL. Maybe columns you are searching are not properly indexed(!?) or you don't have any indexes in tables (?).

Amel
  • 708
  • 6
  • 17
0

I would put my bet on a simple subquery, "naturally" representing the information needed, like this

var query =
    from r in context.StaffingResourceDatas
    where r.EmployeeId != null && !exclusionList.Contains(r.ResourceTitleId)
    select new ChartResourceModel
    {
        ResourceId = r.ResourceId,
        Division = r.ResourceDivision,
        Type = r.ResourceType,
        Dates = dates.Select(d => new ChartDateModel
        {
            Date = d,
            Available = context.StaffingForecasts.Where(f => 
                f.ResourceId == r.ResourceId && f.Date == d).Sum(f => f.Hours) < 24
        }).ToList()
    };
var sqlQuery = query.ToString();
var result = query.ToList();
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343