35

I am working the a very large data set, roughly 2 million records. I have the code below but get an out of memory exception after it has process around three batches, about 600,000 records. I understand that as it loops through each batch entity framework lazy loads, which is then trying to build up the full 2 million records into memory. Is there any way to unload the batch one I've processed it?

ModelContext dbContext = new ModelContext();
IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.OrderBy(t => t.TownID).Batch(200000);
foreach (var batch in towns)
{
    SearchClient.Instance.IndexMany(batch, SearchClient.Instance.Settings.DefaultIndex, "Town", new SimpleBulkParameters() { Refresh = false });
}

Note: The Batch method comes from this project: https://code.google.com/p/morelinq/

The search client is this: https://github.com/Mpdreamz/NEST

Russ Cam
  • 124,184
  • 33
  • 204
  • 266
Mike Norgate
  • 2,393
  • 3
  • 24
  • 45
  • Huge amounts of data is a scenario in which I'm not sure an ORM is an appropriate tool... – Vadim Aug 11 '13 at 08:11
  • @Vadim, ORM is an appropriate tool to process business logic without worrying about underlying data store, however there are easier ways to write batch processing with ORM. – Akash Kava Aug 11 '13 at 08:18
  • 2
    @AkashKava, of course it is. The thing is, when you combine ORMs with large volumes of data, you'll always find yourself "cancelling" all kinds of ORM aspects which you would consider a feature in other scenarios. In other cases, you write code in a way which specifically addresses issues of the ORM you are using. All I'm saying is - given large volumes of data ORMs become problematic. – Vadim Aug 11 '13 at 08:23
  • @Vadim, It becomes Problematic just because it is done wrongly, doesn't mean it should not be used. Large amount of data should never be processed all at once in memory, instead the correct way of batching should be done. See my answer, we are processing millions of records every day with ORM. Large Operation has to be broken down into collection of smaller steps. – Akash Kava Aug 11 '13 at 08:26
  • 2
    How is Batch any different than `Skip(batchNo*batchSize).Take(batchSize)`? – B2K Jun 04 '15 at 15:05

2 Answers2

82

The issue is that when you get data from EF there are actually two copies of the data created, one which is returned to the user and a second which EF holds onto and uses for change detection (so that it can persist changes to the database). EF holds this second set for the lifetime of the context and its this set thats running you out of memory.

You have 2 options to deal with this

  1. renew your context each batch
  2. Use .AsNoTracking() in your query eg:

    IEnumerable<IEnumerable<Town>> towns = dbContext.Towns.AsNoTracking().OrderBy(t => t.TownID).Batch(200000);
    

this tells EF not to keep a copy for change detection. You can read a little more about what AsNoTracking does and the performance impacts of this on my blog: http://blog.staticvoid.co.nz/2012/4/2/entity_framework_and_asnotracking

undefined
  • 33,537
  • 22
  • 129
  • 198
  • Does NoTracking preserve navigation properties? For example if I want to set/add related objects, will it save? – Akash Kava Aug 11 '13 at 08:34
  • 2
    @AkashKava no it wont, if you want to save anything you will need to first attach it back to the context, if the entity has navigation properties these will also need to be attached. If you are actually wanting to modify the entities you load i would suggest using the first method. – undefined Aug 11 '13 at 08:38
  • 2
    Thanks @LukeMcGregor, I found that renewing the context on every batch worked really well – Mike Norgate Aug 11 '13 at 08:41
  • 1
    How do you renew your context? Just `dbContext = new DbContext()`? – muttley91 Sep 26 '14 at 14:38
  • What is `.Batch(200000)`? – mrmashal Mar 11 '16 at 14:46
  • 1
    @mrmashal an extention method on IEnumerable which returns an IEnumerable> where the inner sets have 20000 items. I wrote this myself, but its super simple if you want something similar – undefined Mar 13 '16 at 01:35
  • Implementation of renewal of context for each batch described in my answer to [SqlException from Entity Framework - New transaction is not allowed because there are other threads running in the session](http://stackoverflow.com/a/37762186) – Michael Freidgeim Aug 16 '16 at 02:54
  • Love you, Luke! :P – pawel Mar 21 '18 at 01:26
  • I am receiving the results of a stored proc so AsNoTracking is not available to me. I am returning results from different methods, each using a new'ed context in a using statement so this has me stumped. I need the results for different operations in a large data environment. Still seeing this OutOfMemoryException issue. – hal9000 Apr 26 '18 at 18:28
  • @hal9000 It would be best if you post this as a new question. Asking supplementary questions in the comments is discouraged on SO. Normally if this question seems close you would link it in your new question saying you tried this but it didnt work (and why). – undefined Apr 26 '18 at 20:37
-1

I wrote a migration routine that reads from one DB and writes (with minor changes in layout) into another DB (of a different type) and in this case, renewing the connection for each batch and using AsNoTracking() did not cut it for me.

Note that this problem occurs using a '97 version of JET. It may work flawlessly with other DBs.

However, the following algorithm did solve the Out-of-memory issue:

  • use one connection for reading and one for writing/updating
  • Read with AsNoTracking()
  • every 50 rows or so written/updated, check the memory usage, recover memory + reset output DB context (and connected tables) as needed:

    var before = System.Diagnostics.Process.GetCurrentProcess().VirtualMemorySize64;
    if (before > 800000000)
    {
        dbcontextOut.SaveChanges();
        dbcontextOut.Dispose();
        GC.Collect();
        GC.WaitForPendingFinalizers();
        dbcontextOut = dbcontextOutFunc();
        tableOut = Dynamic.InvokeGet(dbcontextOut, outputTableName);
    }
    
Wolfgang Grinfeld
  • 870
  • 10
  • 11