5

What is the explanation for EF downloading all result rows when AsEnumerable() is used?

What I mean is that this code:

context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();

will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.

What I would like it to do, is to download only enough to gather 100 rows that would satisfy the Id % 2 == 0 condition (most likely just around 200 rows).

Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?

I suppose that it does not work like that for a reason and I'd like to hear a good argument supporting that design decision.

NOTE: This is a completely contrived example and I know normally you should not use EF this way, but I found this in some existing code and was just surprised my assumptions turned out to be incorrect.

thomasb
  • 5,816
  • 10
  • 57
  • 92
Piotr Owsiak
  • 6,081
  • 8
  • 39
  • 42
  • Why are you using `AsEnumerable()` at this point anyway ? – thomasb Aug 05 '15 at 15:46
  • As I stated in the question, it is a contrived example, I know it is not what you should normally do and what I want to know is an argument behind such design decision rather than lazy fetching or rows from the server like you can do with SqlDataReader. – Piotr Owsiak Aug 05 '15 at 16:01
  • @cosmo0: why did you remove the information about EF version? EF7 is already in beta now and they might change the behavior in the future, right? – Piotr Owsiak Aug 05 '15 at 16:08
  • It's not EF-related, it's just logic. See my updated answer for that. It has even nothing to do with EF, I think. – thomasb Aug 05 '15 at 16:10

7 Answers7

18

The short answer: The reason for the different behaviors is that, when you use IQueryable directly, a single SQL query can be formed for your entire LINQ query; but when you use IEnumerable, the entire table of data must be loaded.

The long answer: Consider the following code.

context.Logs.Where(x => x.Id % 2 == 0)

context.Logs is of type IQueryable<Log>. IQueryable<Log>.Where is taking an Expression<Func<Log, bool>> as the predicate. The Expression represents an abstract syntax tree; that is, it's more than just code you can run. Think of it as being represented in memory, at runtime, like this:

Lambda (=>)
  Parameters
    Variable: x
  Body
    Equals (==)
      Modulo (%)
        PropertyAccess (.)
          Variable: x
          Property: Id
        Constant: 2
      Constant: 0

The LINQ-to-Entities engine can take context.Logs.Where(x => x.Id % 2 == 0) and mechanically convert it into a SQL query that looks something like this:

SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0;

If you change your code to context.Logs.Where(x => x.Id % 2 == 0).Take(100), the SQL query becomes something like this:

SELECT *
FROM "Logs"
WHERE "Logs"."Id" % 2 = 0
LIMIT 100;

This is entirely because the LINQ extension methods on IQueryable use Expression instead of just Func.

Now consider context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0). The IEnumerable<Log>.Where extension method is taking a Func<Log, bool> as a predicate. That is only runnable code. It cannot be analyzed to determine its structure; it cannot be used to form a SQL query.

Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
6

Entity Framework and Linq use lazy loading. It means (among other things) that they will not run the query until they need to enumerate the results: for instance using ToList() or AsEnumerable(), or if the result is used as an enumerator (in a foreach for instance).

Instead, it builds a query using predicates, and returns IQueryable objects to further "pre-filter" the results before actually returning them. You can find more infos here for instance. Entity framework will actually build a SQL query depending on the predicates you have passed it.

In your example:

context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();

From the Logs table in the context, it fetches all, returns a IEnumerable with the results, then filters the result, takes the first 100, then lists the results as a List.

On the other hand, just removing the AsEnumerable solves your problem:

context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();

Here it will build a query/filter on the result, then only once the ToList() is executed, query the database.

It also means that you can dynamically build a complex query without actually running it on the DB it until the end, for instance:

var logs = context.Logs.Where(a); // first filter
if (something) {
    logs = logs.Where(b); // second filter
}
var results = logs.Take(100).ToList(); // only here is the query actually executed

Update

As mentionned in your comment, you seem to already know what I just wrote, and are just asking for a reason.

It's even simpler: since AsEnumerable casts the results to another type (a IQueryable<T> to IEnumerable<T> in this case), it has to convert all the results rows first, so it has to fetch the data first. It's basically a ToList in this case.

thomasb
  • 5,816
  • 10
  • 57
  • 92
4

Clearly, you understand why it's better to avoid using AsEnumerable() the way you do in your question.

Also, some of the other answers have made it very clear why calling AsEnumerable() changes the way the query is performed and read. In short, it's because you are then invoking IEnumrable<T> extension methods rather than the IQueryable<T> extension methods, the latter allowing you to combine predicates before executing the query in the database.

However, I still feel that this doesn't answer your actual question, which is a legitimate question. You said (emphasis mine):

What I mean is that this code:

context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();

will download all the rows from the table before passing any row to the Where() method and there could be millions of rows in the table.

My question to you is: what made you conclude that this is true?

I would argue that, because you are using IEnumrable<T> instead of IQueryable<T>, it's true that the query being performed in the database will be a simple:

select * from logs

... without any predicates, unlike what would have happened if you had used IQueryable<T> to invoke Where and Take.

However, the AsEnumerable() method call does not fetch all the rows at that moment, as other answers have implied. In fact, this is the implementation of the AsEnumerable() call:

public static IEnumerable<TSource> AsEnumerable<TSource>(this IEnumerable<TSource> source)
{
    return source;
}

There is no fetching going on there. In fact, even the calls to IEnumerable<T>.Where() and IEnumerable<T>.Take() don't actually start fetching any rows at that moment. They simply setup wrapping IEnumerables that will filter results as they are iterated on. The fetching and iterating of the results really only begins when ToList() is called.

So when you say:

Couldn't EF do on demand loading of rows like you can with plain ADO.NET using Read() method of SqlDataReader and save time and bandwidth?

... again, my question to you would be: doesn't it do that already?

If your table had 1,000,000 rows, I would still expect your code snippet to only fetch up to 100 rows that satisfy your Where condition, and then stop fetching rows.

To prove the point, try running the following little program:

static void Main(string[] args)
{
    var list = PretendImAOneMillionRecordTable().Where(i => i < 500).Take(10).ToList();
}

private static IEnumerable<int> PretendImAOneMillionRecordTable()
{
    for (int i = 0; i < 1000000; i++)
    {
        Console.WriteLine("fetching {0}", i);
        yield return i;
    }
}

... when I run it, I only get the following 10 lines of output:

fetching 0
fetching 1
fetching 2
fetching 3
fetching 4
fetching 5
fetching 6
fetching 7
fetching 8
fetching 9

It doesn't iterate through the whole set of 1,000,000 "rows" even though I am chaining Where() and Take() calls on IEnumerable<T>.

Now, you do have to keep in mind that, for your little EF code snippet, if you test it using a very small table, it may actually fetch all the rows at once, if all the rows fit within the value for SqlConnection.PacketSize. This is normal. Every time SqlDataReader.Read() is called, it never only fetches a single row at a time. To reduce the amount of network call roundtrips, it will always try to fetch a batch of rows at a time. I wonder if this is what you observed, and this mislead you into thinking that AsEnumerable() was causing all rows to be fetched from the table.

Even though you will find that your example doesn't perform nearly as bad as you thought, this would not be a reason not to use IQueryable. Using IQueryable to construct more complex database queries will almost always provide better performance, because you can then benefit from database indexes, etc to fetch results more efficiently.

Community
  • 1
  • 1
sstan
  • 35,425
  • 6
  • 48
  • 66
  • I'm genuinely confused, some people like this https://stackoverflow.com/questions/2876616/returning-ienumerablet-vs-iqueryablet/2876655#2876655 say that AsEnumerable will load all the data into memory and then filters them etc... (in fact not only that answer, many other answers on stackoverflow say the same thing) but then your answer says that it doesn't load all the data into memory and THEN filter or whatever... what's the truth? does it load all the data into memory or not. – bzmind Mar 13 '22 at 18:30
  • Btw, is there any Microsoft docs to read about the real differences between the AsEnumerable and AsQueryable? I searched but I didn't find anything useful. they were not related to whether it loads them into memory or not... I don't know what to search for tbh. – bzmind Mar 13 '22 at 18:32
  • @bzmind: The wording in the post you linked is a little confusing, but I assure you that he meant the same thing, which is why he says that "both will give you deferred execution". In short, IQueryable enables filtering at the database level BEFORE data is read in memory, where as IEnumerable can only filter data that is read in memory. However, this does not mean that the COMPLETE SET of data will be read in memory before any filtering is applied. IEnumerable is able to apply the filtering in memory progressively as the data is read in memory little by little. – sstan Apr 01 '22 at 00:13
0

AsEnumerable() eagerly loads the DbSet<T> Logs

You probably want something like

context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable();

The idea here is that you're applying a predicate filter to the collection before actually loading it from the database.

An impressive subset of the world of LINQ is supported by EF. It will translate your beautiful LINQ queries into SQL expressions behind the scenes.

Nick Strupat
  • 4,928
  • 4
  • 44
  • 56
  • The question is why the DbSet is eagerly loaded in this case? Does it have to be? – Piotr Owsiak Aug 05 '15 at 16:03
  • `DbSet` is not loaded eagerly on its own. It offers an `IQueryable`-ish interface. It implements `IEnumerable`, which may be the nugget of knowledge you're missing here. – Nick Strupat Aug 05 '15 at 16:09
  • Anothe way to put it: `AsEnumerable()` is not needed to make the collection "enumerable". `DbSet` is already "enumerable". – Nick Strupat Aug 05 '15 at 16:11
0

I have come across this before. The context command is not executed until a linq function is called, because you have done

context.Logs.AsEnumerable() 

it has assumed you have finished with the query and therefore compiled it and returns all rows. If you changed this to:

context.Logs.Where(x => x.Id % 2 == 0).AsEnumerable() 

It would compile a SQL statement that would get only the rows where the id is modular 2. Similarly if you did

context.Logs.Where(x => x.Id % 2 == 0).Take(100).ToList();

that would create a statement that would get the top 100...

I hope that helps.

r3plica
  • 13,017
  • 23
  • 128
  • 290
  • I do understand how it works but I'd like to know why it works this way rather the other way I described in the question. – Piotr Owsiak Aug 05 '15 at 16:02
  • Well, your assumption of "lazily" loading the data means that after you have written your linq query it then goes to the database and pulls back only the records you are interested in. The thing is, that is exactly what is happening. The only problem you have, is to remember to use the linq functions at the end of your expression because that function is what tells EF to compile the SQL script. – r3plica Aug 05 '15 at 16:08
0

LinQ to Entities has a store expression formed by all the Linq methods before It goes to an enumeration.

When you use AsEnumerable() and then Where() like this:

context.Logs.Where(...).AsEnumerable()

The Where() knows that the previous chain call has a store expression so he appends his predicate to It for lazy loading.

The overload of Where that is being called is different if you call this:

context.Logs.AsEnumerable().Where(...)

Here the Where() only knows that his previous method is an enumeration (it could be any kind of "enumerable" collection) and the only way that he can apply his condition is iterating over the collection with the IEnumerable implementation of the DbSet class, which must to retrieve the records from the database first.

MMZurita
  • 125
  • 8
0

I don't think you should ever use this:

context.Logs.AsEnumerable().Where(x => x.Id % 2 == 0).Take(100).ToList();

The correct way of doing things would be:

context.Logs.AsQueryable().Where(x => x.Id % 2 == 0).Take(100).ToList();

Answer with explanations here:

Community
  • 1
  • 1
radu florescu
  • 4,315
  • 10
  • 60
  • 92
  • 2
    to be honest, why even use AsQueryable? I mean the context.Logs entity set is already cast as IQueryable – r3plica Aug 05 '15 at 16:10
  • I know what the correct way is, the point of the question is why it cannot work the other way I described. – Piotr Owsiak Aug 05 '15 at 16:11
  • Because I don't know whether that is a DbSet to assume otherwise. – radu florescu Aug 05 '15 at 16:11
  • @PiotrOwsiak Because is not optimum to work like that. It will take you time and memory to get that data back. – radu florescu Aug 05 '15 at 16:12
  • @Floradu88: I believe fetching whole table in this case is less optimum than fetching enough rows to satisfy the condition (which would most likely be around 200 rows). Thus I'm really surprised my assumption was wrong. – Piotr Owsiak Aug 05 '15 at 17:07