Entity data querying and memory leak

Question

I am downloading a lot of data in a loop but after some operations I remove them but what I see is that memory allocation is growing really fast, few seconds and 1GB, so how can clean after each iteration?

    using (var contex = new DB)
    {

        var inputs = contex.AIMRInputs.Where(x => x.Input_Type == 1);

        foreach (var input in inputs)
        {
            var data = contex.Values.Where(x => x.InputID == input.InputID).OrderBy(x => x.TimeStamp).ToList();

            if (data.Count == 0) continue;
            foreach (var value in data)
            {
               Console.WriteLine(Value.property);
            }
            data.Clear();


        }
    }

Is the ToList() call needed? You could get away without using it and probably reduce memory allocation that way. — KingCronus, Mar 23 '12 at 17:06
For that matter, why sort it if you're just going to check to see if the count is zero? instead you can just add an `Any()` after the where clause. If there are no items it returns false, otherwise true. It doesn't even need to iterate past the first item in the enumerable (that passes through the where). — Servy, Mar 23 '12 at 17:09
no way, i removed toList() but is the same, i need to do sth on that data so i need to downlaod them all not only infromation how much there are of them. — kosnkov, Mar 23 '12 at 17:09
I would remove the call to GC.Collect. Is there any reason you're doing that intentionally? See http://stackoverflow.com/questions/118633/whats-so-wrong-about-using-gc-collect — Mike Hofer, Mar 23 '12 at 17:10
but you discard that information...and if you don't discard it but plan to use it then you can't free up the memory...because you're using it. — Servy, Mar 23 '12 at 17:11
i just edited, in reall instad of writing into console I write to file. — kosnkov, Mar 23 '12 at 17:13
99 times out of 100, you should never call `GC.Collect()` directly. The 100th time, you should walk around the block for some fresh air, then come back and figure out how not to call it directly. — GalacticCowboy, Mar 23 '12 at 17:13
any way seems like after downloading data for each input memory only grows like not disposing data downloaded before — kosnkov, Mar 23 '12 at 17:15
Besides which the object still exist in memory in `inputs` and `contex`. Are you sure it's not those collections that are eating the memory? — JohnL, Mar 23 '12 at 17:17
If you're disposing it correctly, the garbage collector should manage this for you. Don't worry about the specific number - unless you start running into `OutOfMemoryException`, the app will work fine regardless of how much memory it's got in its working set. — GalacticCowboy, Mar 23 '12 at 17:18

Slauma · Accepted Answer · 2012-05-21T16:54:46.150

The first thing you can do, is disabling change tracking because you are not changing any data in your code. This prevents that the loaded objects get attached to the context:

For DbContext (EF >= 4.1):

var inputs = contex.AIMRInputs.AsNoTracking()
    .Where(x => x.Input_Type == 1);

And:

var data = contex.Values.AsNoTracking()
    .Where(x => x.InputID == input.InputID)
    .OrderBy(x => x.TimeStamp)
    .ToList();

Edit

For EF 4.0 you can leave your queries as they are but add the following as the first two lines in the using block:

contex.AIMRInputs.MergeOption = MergeOption.NoTracking;
contex.Values.MergeOption = MergeOption.NoTracking;

This disables change tracking for ObjectContext.

Edit 2

...especially refering to @James Reategui's comment below that AsNoTracking reduces memory footprint:

This is often true (like in the model/query of this question) but not always! Actually using AsNoTracking can be counterproductive regarding memory usage.

What does AsNoTracking do when objects get materialized in memory?

First: It doesn't attach the entity to the context and therefore doesn't create entries in the context's state manager. Those entries consume memory. When using POCOs the entries contain a snapshot of the entity's property values when it was first loaded/attached to the context - basically a copy of all (scalar) properties in addition to the object itself. So the comsumed memory takes (roughly) twice as much as the object's size when AsNoTracking is not applied.
Second: On the other hand, when entities don't get attached to the context EF cannot leverage the advantage of identity mapping between key values and object reference identities. This means that objects with the same key will be materialized multiple times which comsumes additional memory while without using AsNoTracking EF will ensure that an entity is only materialized once per key value.

The second point becomes especially important when related entities are loaded. Simple example:

Say, we have an Order and a Customer entity and an order has one customer Order.Customer. Say the Order object has the size 10 byte and the Customer object the size 20 byte. Now we run this query:

var orderList = context.Orders
    .Include(o => o.Customer).Take(3).ToList();

And suppose all 3 loaded orders have the same customer assigned. Because we didn't disable tracking EF will materialize:

3 orders objects = 3x10 = 30 byte
1 customer object = 1x20 = 20 byte (because the context recognizes that the customer is the same for all 3 orders it materializes only one customer object)
3 order snapshot entries with original values = 3x10 = 30 byte
1 customer snapshot entry with original values = 1x20 = 20 byte

Sum: 100 byte

(For simplicity I assume that the context entries with the copied property values have the same size as the entities themselves.)

Now we run the query with disabled change tracking:

var orderList = context.Orders.AsNoTracking()
    .Include(o => o.Customer).Take(3).ToList();

The materialized data are:

3 orders objects = 3x10 = 30 byte
3 (!) customer objects = 3x20 = 60 byte (No identity mapping = multiple objects per key, all three customer objects will have the same property values, but they are still three objects in memory)
No snapshot entries

Sum: 90 byte

So, using AsNoTracking the query consumed 10 byte less memory in this case.

Now, the same calculation with 5 orders (Take(5)), again all orders have the same customer:

Without AsNoTracking:

5 orders objects = 5x10 = 50 byte
1 customer object = 1x20 = 20 byte
5 order snapshot entries with original values = 5x10 = 50 byte
1 customer snapshot entry with original values = 1x20 = 20 byte

Sum: 140 byte

With AsNoTracking:

5 orders objects = 5x10 = 50 byte
5 (!) customer objects = 5x20 = 100 byte
No snapshot entries

Sum: 150 byte

This time using AsNoTracking was 10 bytes more expensive.

The numbers above are very rough, but somewhere is a break-even point where using AsNoTracking can need more memory.

The difference in memory consumption between using AsNoTracking or not strongly depends on the query, the relationships in the model and the concrete data that are loaded by the query. For example: AsNoTracking would be always better in memory consumption when the orders in the example above all (or mostly) have different customers.

Conclusion: AsNoTracking is primarily meant as a tool to improve query performance, not memory usage. In many cases it will also consume less memory. But don't be surprised if a specific query needs more memory with AsNoTracking. In the end you must measure the memory footprint for a solid decision in favor or against AsNoTracking.

@kosnkov: EF 4.3.1 is current release: http://blogs.msdn.com/b/adonet/archive/2012/02/29/ef4-3-1-and-ef5-beta-1-available-on-nuget.aspx You can download it into a project in VS2010 with the Nuget package manager console: http://nuget.org/packages/EntityFramework/4.3.1 — Slauma, Mar 23 '12 at 17:26
@kosnkov: EF 4.0 and `ObjectContext` also have the option to disable change tracking, see my Edit. — Slauma, Mar 23 '12 at 17:30
.AsNoTracking() really reduces memory usage for big queries. Awesome. — James Reategui, May 21 '12 at 14:31
@JamesReategui: Often, but not always! See the Edit2 section in my answer above. — Slauma, May 21 '12 at 16:55
Thanks for the rundown, good to know the details of how AsNoTracking works. For my application, using it reduced memory usage from 900 megs to 130 when running the task on normalized data. (millions of rows) — James Reategui, May 22 '12 at 17:45

score 1 · Answer 2 · answered Mar 23 '12 at 17:17

Part if the issue here could be with respect to the DataContext. Many of them cache information or store additional information as you perform queries, and as such it's memory footprint will grow over time. I would check with a profiler first, but if this is your problem you may need to re-create a new datacontext after every X requests (experiment with different values of X to see what works best).

I'd also like to note that most people tend to have a lot of memory. You should be really sure that you're using more memory than is truly acceptable before you start making these types of optimization. The GC will also start more aggressively clearing memory as you have less free memory to work with. It doesn't bother prematurely optimizing (and neither should you).

ok basically u r right, and Slauma also is right a bit, when i create this contex in loop then momery jumps from 50 to 200 Mb but then come down again to 50 so seems like entity track all what I do, isn't there any way in EF 4.0 to disable it? if I only want to download the data ? not change them. — kosnkov, Mar 23 '12 at 17:28

Entity data querying and memory leak

2 Answers2

Linked