1

I was hoping I could get some help with a performance problem I'm having in EntityFramework 4.4. I'm working on converting an application that was using EDMX files over to code first and I've run into a problem when running queries with a large number of objects in the "where" clause of the LINQ query.

Here's a short overview of how everything is laid out (Entity doesn't refer to EF, it's the name given to a generic "thing" in our code):

public class ExampleDbContext : DbContext
{
    public DbSet<EntityTag> EntityTags { get; set; }
    public DbSet<Entity> Entities { get; set; }

    public DbSet<Log> Logs { get; set; }

    protected override void OnModelCreating(DbmodelBuilder modelBuilder)
    {
        // Fluent mappings added to modelBuilder.Configurations.Add() in here
    }
}

public class EntityTag
{
    public int Id { get; set; }

    public virtual Entity Entity { get; set; }
    public int EntityId { get; set; }

    public virtual Log Deleted { get; set; }
    public int? DeletedId { get; set; }
}

public class Entity
{
    public int Id { get; set; }

    pulic byte[] CompositeId { get; set; }
}

// Used to log when an event happens
public class Log
{
    public int Id { get; set; }

    public string Username { get; set; }
    public DateTime Timestamp { get; set; }
}

The query I'm running that causes the problem is:

// Creates an IEnumerable<byte[]> with the keys to find
var computedKeys = CreateCompositeIDs(entityKeys);

// Run the query and find any EntityTag that isn't deleted and is in
// the computedKeys list
var result = from et in Context.EntityTags
             where computedKeys.Contains(et.Entity.CompositeId) &&
                   et.Deleted == null
             select et;

var entityTags = result.ToList();

When computedKeys contains only a few Ids (15 for example) the code and query runs quickly. When I have a large number of Ids (1600 is normal at this point and it could get higher) it takes minutes (at 500, I haven't even tried with 1500 yet) to run that query once it's enumerated with ToList(). I've also removed the computedKeys.Contains() (leaving et.Deleted) from the query with a large number of computedKeys and the query ends up running quickly.

Through debugging I've determined that creating the list of keys is fast, so that's not the problem. When I hook a profiler up to MSSQL to see the query that's generated it looks normal in that all of the CompositeId's are included in a WHERE CompositeId IN ( /* List of Ids, could be 1500 of them */) and when the query shows up in the profiler it executes in less than a second so I don't think it's a database optimization thing, either. The profiler will sit there without anything showing up for the entire time it's running aside from the last second or so when it quickly returns a result.

I hooked up dotTrace and it looks like a lot of the time is spent within System.Data.Query.PlanCompiler.JoinGraph.GenerateTransitiveEdge(JoinEdge, JoinEdge) (119,640 ms) and System.Collections.Generic.List+Enumerator1.MoveNext` (54,270 ms) is called within that method twice, I think, based on the total execution time for each of them.

I just can't seem to figure out why it's taking so long to generate the query. It doesn't seem to be any faster the second time it executes after compiling, either, so it doesn't look like it's being cached.

Thanks in advance for the help!

  • @MitchWheat Once the query is actually generated the query runs quickly, even with 1500 Id's in the query. If I copy the query that's generated from the SQL trace and execute it again after flushing the SQL cache it will still run in less than a second. – Erik Davidson Feb 11 '13 at 04:58
  • Here is more about the performance problem with `Contains`: http://stackoverflow.com/a/7936350/270591 and http://stackoverflow.com/a/8108643/270591 – Slauma Feb 11 '13 at 17:39

1 Answers1

0

I was able to figure it out. Once I decided not to be held to the original query and reconsidered the result, I rewrote the query to be:

var computedKeys = CreateCompositeIDs(entityKeys);

var entityTags = (from e in Context.Entities
                  where computedKeys.Contains(e.CompositeId)
                  from et in e.Tags
                  select et).Distinct();
entityTags = from et in entityTags
             where et.Deleted == null
             select et;

return entityTags;

When I started querying the entitites directly and took advantage of the relationship to EntityTag (which I forgot to include in the original question...) via Tags and then filtered only the existing EntityTag it sped up the query to the point where it's all running in under one second.