12

I would like to be able to iterate through every row in an entity table without holding every row in memory. This is a read only operation and every row can be discarded after being processed.

If there is a way to discard the row after processing that would be fine. I know that this can be achieved using a DataReader (which is outside the scope of EF), but can it be achieved within EF?

Or is there a way to obtain a DataReader from within EF without directly using SQL?

More detailed example:

Using EF I can code:

foreach (Quote in context.Quotes)
   sw.WriteLine(sw.QuoteId.ToString()+","+sw.Quotation);

but to achieve the same result with a DataReader I need to code:

// get the connection to the database
SqlConnection connection = context.Database.Connection as SqlConnection;

// open a new connection to the database
connection.Open();

// get a DataReader for our table
SqlCommand command = new SqlCommand(context.Quotes.ToString(), connection);
SqlDataReader dr = command.ExecuteReader();

// get a recipient for our database fields
object[] L = new object[dr.FieldCount];

while (dr.Read())
{
    dr.GetValues(L);
    sw.WriteLine(((int)L[0]).ToString() + "," + (string)L[1]);
}

The difference is that the former runs out of memory (because it is pulling in the entire table in the client memory) and the later runs to completion (and is much faster) because it only retains a single row in memory at any one time.

But equally importantly the latter example loses the Strong Typing of EF and should the database change, errors can be introduced.

Hence, my question: can we get a similar result with strongly typed rows coming back in EF?

Mythlandia
  • 576
  • 1
  • 5
  • 15
  • you want to delete an object from an entity framework model without first loading it? – Yuliam Chandra Jul 17 '14 at 05:30
  • 1
    No. I want to process rows and discard them as soon as they have been processed. – Mythlandia Jul 17 '14 at 05:48
  • both using datareader and iterate row from EF entity table will do database roundtrip and holding the data into memory, you want to process rows without pulling the data from database into memory? – Yuliam Chandra Jul 17 '14 at 05:53
  • 1
    No. I want to process each row and discard it after processing. DataReader will do the trick but it gives me back an untyped row; I would like to receive back the data as a strongly typed row. I need every row from the database, but only one row at a time. Each row is read, processed, and then of no further interest. Using EF, it holds every row (even after being processed) and gives an out-of-memory exception. – Mythlandia Jul 17 '14 at 06:26
  • Assuming that the table has a PK why can you process the data in 5k and 10k chunks? – Mark Kram Oct 15 '18 at 02:19

4 Answers4

5

Based on your last comment, I'm still confused. Take a look at both of below code.

EF

using (var ctx = new AppContext())
{
    foreach (var order in ctx.Orders)
    {
        Console.WriteLine(order.Date);
    }
}

EF Profiler

Data Reader

var constr = ConfigurationManager.ConnectionStrings["AppContext"].ConnectionString;
using (var con = new SqlConnection(constr))
{
    con.Open();    
    var cmd = new SqlCommand("select * from dbo.Orders", con);
    var reader = cmd.ExecuteReader();
    while (reader.Read())
    {
        Console.WriteLine(reader["Date"]);
    }
}

Data Reader Profiler

Even though EF has few initial query, both of them execute similar query that can be seen from profiler..

Yuliam Chandra
  • 14,494
  • 12
  • 52
  • 67
  • 3
    OK. Understand the confusion. I will amend the question to give more detail. Bottom line: I am not concerned about pulling the rows from the database. As you say, in both cases the same number of rows will be taken with a similar DB cost. *BUT* when the table is 5,000,000 rows (in my simple example) the database can be traversed in seconds but my application runs out of memory trying to hold some 5-10GB of data in memory. I only need to hold ONE row at a time locally, not the entire table. – Mythlandia Jul 17 '14 at 06:58
  • take a loot at [this post](http://stackoverflow.com/questions/19311491/how-entity-framework-works-for-large-number-of-records) Somebody already mentioned that EF fetches only necessary records, but I can't find the reference yet. – Yuliam Chandra Jul 17 '14 at 07:12
  • 3
    The problem is not with how many records EF fetches: I want it to eventually fetch EVERY record in the table. The problem is that my client program only need ONE record at a time from the table; it does not need to preserve the other 5,000,000 records in the client program. See the code from the revised question. There is no problem when the table is 5,000 records (the memory is just released immediately after the list has been traversed); the problem is when the table is too big to fit in the client memory at once. Worse - this is only TEST data, the database will grow massively in size. – Mythlandia Jul 17 '14 at 07:21
  • According to docs for EF Core, foreach will stream the results just as you are trying. Not sure about EF6 implementation. https://learn.microsoft.com/en-us/ef/core/performance/efficient-querying#buffering-and-streaming – Chris Danna May 02 '23 at 16:45
  • I'm having a hard time finding anything documenting when EF switched from buffering to streaming results by default, but it appears that `AsStreaming()` is deprecated in v6.4.4, which suggests that version does stream by default. – Shelby Oldfield Aug 21 '23 at 15:17
1

I haven't tested it, but try foreach (Quote L in context.Quotes.AsNoTracking()) {...}. .AsNoTracking() should not put entities in cache so I assume they will be consumed by GC when they out of the scope.

Another option is to use context.Entry(quote).State = EntityState.Detached; in the foreach loop. Should have the similar effect as the option 1.

Third option (should definitely work, but require more coding) would be to implement batch processing (select top N entities, process, select next top N). In this case make sure that you dispose and create new context every iteration (so GC can eat it:)) and use proper OrderBy() in the query.

Vadim K.
  • 1,081
  • 1
  • 14
  • 26
  • This is the one that worked for me. Table with 3,634,943 rows with a varchar(max) column that I had to process the text and setting the AsNoTracking did the trick. Thanks. – Tom Padilla Dec 03 '14 at 14:15
0

You need to use an EntityDataReader, which behaves in a way similar to a traditional ADO.NET DataReader.

The problem is that, to do so, you need to use ObjectContext instead of DbContext, which makes things harder.

See this SO answer, not the acepted one: How can I return a datareader when using Entity Framework 4?

Even if this referes to EF4, in EF6 things work in the same way. Usually an ORM is not intended for streaming data, that's why this functionality is so hidden.

You can also look at this project: Entity Framework (Linq to Entities) to IDataReader Adapter

Community
  • 1
  • 1
JotaBe
  • 38,030
  • 8
  • 98
  • 117
0

I have done this by pages. And cleaning the Context after each page load.

Sample: Load first 50 rows Iterate over them Clean the Context or create a new one.

Load second 50 rows ...

Clean the Context = Set all its Entries as Detached.