2

I have created a simple method using EF 6 that will query with grouping based on some input information and some possible Type and SubType values, as the following

public int GetOriginal(DateTime startDate, DateTime endDate, List<int> userIds)
{
    DateTime dt = DateTime.UtcNow;
    var ret = DbContext.ContactFeedback
           .Where(c => c.FeedbackDate >= startDate && 
            c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
            (c.Type == FeedbackType.A || c.Type == FeedbackType.B || c.Type == FeedbackType.C))
            .GroupBy(x => new {TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId})
            .Count();
    Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
    return ret;
}

It works as expected, however if I try to create a new auxiliar method that receives the "query" (Func type object) as input to be run, I see a very big difference in performance which I'm not able to explain, because they should run exactly the same. Here is my rewritten methods

public int GetRewritten(DateTime startDate, DateTime endDate, List<int> userIds)
{
    DateTime dt = DateTime.UtcNow;
    var query = new Func<ContactFeedback, bool>(c => c.FeedbackDate >= startDate && c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
                 (c.Type == FeedbackType.A || c.Type == FeedbackType.B ||
                  c.Type == FeedbackType.C));
    var ret = GetTotalLeadsByFeedback(query);
    Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
    return ret;
}

private int GetTotalLeadsByFeedback(Func<ContactFeedback, bool> query)
{
    return DbContext.ContactFeedback
        .Where(query)
        .GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId })
        .Count();
}

Here are the running times in seconds

GetOriginal with 1 userId:0.0156318 - With ~100 usersIds: 0.1455635

GetRewritten with 1 userId:0.4742711 - With ~100 usersIds: 7.2555701

As you can see the difference is huge, anyone can share a light on why this occurs?

I'm running everything on Azure with a SQL Server DB if it helps

Safin Ahmed
  • 560
  • 4
  • 12
  • 1
    This SO post explains why http://stackoverflow.com/questions/793571/why-would-you-use-expressionfunct-rather-than-funct – phil soady Dec 02 '14 at 23:51

1 Answers1

1

I see a very big difference in performance which I'm not able to explain, because they should run exactly the same.

They're considerably different in approach. The first part of your initial method's query:

DbContext.ContactFeedback
       .Where(c => c.FeedbackDate >= startDate && 
        c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
        (c.Type == FeedbackType.A || c.Type == FeedbackType.B || c.Type == FeedbackType.C))

Is equivalent to:

DbContext.ContactFeedback
      .Where(new Expression<Func<ContactFeedback, bool>>(new Func<ContactFeedback, bool>(c => c.FeedbackDate >= startDate && c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
             (c.Type == FeedbackType.A || c.Type == FeedbackType.B ||
              c.Type == FeedbackType.C)))

When you call .Where on an IQueryable<T> it will (barring a case where the type implementing IQueryable<T> has its own appliable .Where which would be strange) call into:

public static IQueryable<TSource> Where<TSource>(
  this IQueryable<TSource> source,
  Expression<Func<TSource, bool>> predicate
)

Bearing in mind that lambdas in source code can be turned into either a Func<…> or an Expression<Func<…>> as applicable.

Entity Framework then combines this query with the GroupBy and finally upon Count() turns the entire query into the appropriate SELECT COUNT … query, which the database performs (just how quickly depending on table contents and what indices are set, but which should be reasonably quick) and then a single value is sent back from the database for EF to obtain.

Your version though has explicitly assigned the lambda to a Func<ContactFeedback, bool>. As such using it with Where it has to call into:

public static IEnumerable<TSource> Where<TSource>(
  this IEnumerable<TSource> source,
  Func<TSource, bool> predicate
)

So to do the Where EF has to do retrieve every column of every row from the database, and then filter out those rows for which that Func returns true, then group them in memory (which requires storing the partially-constructed groups) before doing a Count by a mechanism like:

public int Count<T>(this IEnumerable<T> source)
{
  /* some attempts at optimising that don't apply to this case and so in fact just waste a tiny amount omitted */
  int tally = 0;
  using(var en = source.GetEnumerator())
    while(en.MoveNext())
      ++tally;
  return tally;
}

This is a lot more work with a lot more traffic between the EF and database, and so a lot slower.

A rewrite of the sort you attempted would be better approximated by:

public int GetRewritten(DateTime startDate, DateTime endDate, List<int> userIds)
{
    DateTime dt = DateTime.UtcNow;
    var query = new Expression<Func<ContactFeedback, bool>>(c => c.FeedbackDate >= startDate && c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
                 (c.Type == FeedbackType.A || c.Type == FeedbackType.B ||
                  c.Type == FeedbackType.C));
    var ret = GetTotalLeadsByFeedback(query);
    Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
    return ret;
}

private int GetTotalLeadsByFeedback(Expression<Func<ContactFeedback, bool>> predicate)
{
    return DbContext.ContactFeedback
        .Where(predicate)
        .GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId })
        .Count();
}

(Note also that I changed the name of the predicate to predicate, as predicate is more commonly used for predicates, query for a source along with zero or more methods acting upon it; so DbContext.ContactFeedback, DbContext.ContactFeedback.Where(predicate) and DbContext.ContactFeedback.Where(predicate).GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId }) would all be queries if enumerated, and DbContext.ContactFeedback.Where(predicate).GroupBy(x => new { TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId }).Count() is a query that immediately executes and returns a single value).

Conversely, the form you ended up with could be written back into the style of GetOriginal as:

public int GetNotOriginal(DateTime startDate, DateTime endDate, List<int> userIds)
{
    DateTime dt = DateTime.UtcNow;
    var ret = DbContext.ContactFeedback
           .AsEnumerable()
           .Where(c => c.FeedbackDate >= startDate && 
            c.FeedbackDate <= endDate && userIds.Contains(c.UserId) &&
            (c.Type == FeedbackType.A || c.Type == FeedbackType.B || c.Type == FeedbackType.C))
            .GroupBy(x => new {TruncateTime = DbFunctions.TruncateTime(x.FeedbackDate), x.LeadId, x.UserId})
            .Count();
    Console.WriteLine(string.Format("{0}",DateTime.UtcNow - dt));
    return ret;
}

Note the AsEnumerable forcing the Where and everything that follows to be executed in the .NET application, rather than on the database.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251