just ran into the same problem. I've used NHibernate before I used EF as ORM tool and it had exactly the same problem. These frameworks just keep the objects in memory as long as the context is alive, which has two consequences:
serious performance slowdown: the framework does comparisons between the objects in memory (e.g. to see if an object exists or not). You will notice a gradual degradation of performance when processing many records
you will eventually run out of memory.
If possible I always try to do large batch operation on the database using pure SQL (as the post above states clearly), but in this case that wasn't an option. So to solve this, what NHibernate has is a 'Clear' method on the session, which throws away all object in memory that refer to database records (new ones, added ones, corrupt ones...)
I tried to mimic this method in entity framework as follows (using the post described above):
public partial class MyEntities
{
public IEnumerable<ObjectStateEntry> GetAllObjectStateEntries()
{
return ObjectStateManager.GetObjectStateEntries(EntityState.Added |
EntityState.Deleted |
EntityState.Modified |
EntityState.Unchanged);
}
public void ClearEntities()
{
foreach (var objectStateEntry in GetAllObjectStateEntries())
{
Detach(objectStateEntry.Entity);
}
}
}
The GetAllObjectStateEntries() method is taken separately because it's useful for other things. This goes into a partial class with the same name as your Entities class (the one EF generates, MyEntities in this example), so it is available on your entities instance.
I call this clear method now every 1000 records I process and my application that used to run for about 70 minutes (only about 400k entities to process, not even millions) does it in 25mins now. Memory used to peak to 300MB, now it stays around 50MB