I was hoping I could get some help with a performance problem I'm having in EntityFramework 4.4. I'm working on converting an application that was using EDMX files over to code first and I've run into a problem when running queries with a large number of objects in the "where" clause of the LINQ query.
Here's a short overview of how everything is laid out (Entity doesn't refer to EF, it's the name given to a generic "thing" in our code):
public class ExampleDbContext : DbContext
{
public DbSet<EntityTag> EntityTags { get; set; }
public DbSet<Entity> Entities { get; set; }
public DbSet<Log> Logs { get; set; }
protected override void OnModelCreating(DbmodelBuilder modelBuilder)
{
// Fluent mappings added to modelBuilder.Configurations.Add() in here
}
}
public class EntityTag
{
public int Id { get; set; }
public virtual Entity Entity { get; set; }
public int EntityId { get; set; }
public virtual Log Deleted { get; set; }
public int? DeletedId { get; set; }
}
public class Entity
{
public int Id { get; set; }
pulic byte[] CompositeId { get; set; }
}
// Used to log when an event happens
public class Log
{
public int Id { get; set; }
public string Username { get; set; }
public DateTime Timestamp { get; set; }
}
The query I'm running that causes the problem is:
// Creates an IEnumerable<byte[]> with the keys to find
var computedKeys = CreateCompositeIDs(entityKeys);
// Run the query and find any EntityTag that isn't deleted and is in
// the computedKeys list
var result = from et in Context.EntityTags
where computedKeys.Contains(et.Entity.CompositeId) &&
et.Deleted == null
select et;
var entityTags = result.ToList();
When computedKeys contains only a few Ids (15 for example) the code and query runs quickly. When I have a large number of Ids (1600 is normal at this point and it could get higher) it takes minutes (at 500, I haven't even tried with 1500 yet) to run that query once it's enumerated with ToList()
. I've also removed the computedKeys.Contains()
(leaving et.Deleted) from the query with a large number of computedKeys
and the query ends up running quickly.
Through debugging I've determined that creating the list of keys is fast, so that's not the problem. When I hook a profiler up to MSSQL to see the query that's generated it looks normal in that all of the CompositeId's are included in a WHERE CompositeId IN ( /* List of Ids, could be 1500 of them */)
and when the query shows up in the profiler it executes in less than a second so I don't think it's a database optimization thing, either. The profiler will sit there without anything showing up for the entire time it's running aside from the last second or so when it quickly returns a result.
I hooked up dotTrace and it looks like a lot of the time is spent within System.Data.Query.PlanCompiler.JoinGraph.GenerateTransitiveEdge(JoinEdge, JoinEdge)
(119,640 ms) and System.Collections.Generic.List+Enumerator
1.MoveNext` (54,270 ms) is called within that method twice, I think, based on the total execution time for each of them.
I just can't seem to figure out why it's taking so long to generate the query. It doesn't seem to be any faster the second time it executes after compiling, either, so it doesn't look like it's being cached.
Thanks in advance for the help!