Why don't lazy LINQ Enumerables have a different contract than materialized ones?

Question

It feels strange to me that some LINQ predicates that will materialize a collection (running SQL in the Entity framework case), and that materialization changes what is valid code, without changing the compile-time programming contract.

For example, given bool IsGoodBook(string):

entities.Books.ToList().Where(b => IsGoodBook(b.title)).Count() works, but takes forever because it loads the entire Books table in to memory.

But entities.Books.Where(b => IsGoodBook(b.title)).Count() fails at runtime, because IsGoodBook is a custom function not understood by my database.

I would have expected b.title to be of some sort of ProxyString type in the first case (maybe dependent on the EF-library I'm using - SQL Server, SQLite, PostgreSQL etc.), and an ordinary C# string in the second.

I further would have expected something like .Materialize<T>(Func<T>) and friends so that you could call .Materialize(m => m.Count), .Materialize(m => m.ToDictionary(...)). I concede that's a bit more code but it makes it clear where the SQL execution happens.

Using the same types here seems to reintroduce the same class of bugs that were finally put to rest by making string? a different type than string.

Was there a rationale for using the same types on both sides of materialization?

See also https://learn.microsoft.com/en-us/ef/core/providers/sql-server/functions

I'm confused what contract you see as different - both are lambdas that take in a `Book` and output a `Boolean`. What is different between the two contracts? The _content_ is certainly different but the _contract_ is the same with the subtle distinction that the first takes an `Expression` and the second takes a `Func`, but that is not the distinction you're talking about here - you're talking about the _content_, not the _contract_. — D Stanley, Mar 02 '23 at 23:22
And note that you can use `AsEnumerable()` to change from SQL context to in-memory context, but all of the records will still be loaded into memory; possibly more efficiently than creating a list, though. For example: `entities.Books.AsEnumerable().Where(b => IsGoodBook(b.title)).Count()` — D Stanley, Mar 02 '23 at 23:23

score 0 · Answer 1 · answered Mar 02 '23 at 23:27

If you just want to separate the SQL context from the in-memory context, you can use AsEnumerable() to convert the IQueryable query to an IEnumerable query:

entities.Books.AsEnumerable().Where(b => IsGoodBook(b.title)).Count()

It will still have to load every book into memory, but might do it more efficiently that using ToList() since it's still just wrapping a query.

Put another way, IQueryable<T> is the interface used for a query against a data source, and IEnumerable<T> is used for an in-memory query. Both are lazy-loaded, and not materialized until you call a method that materializes them, like ToList, Count, Any, etc.

Guru Stron · Answer 2 · 2023-03-02T23:43:10.893

and that materialization changes what is valid code, without changing the compile-time programming contract

AsEnumerable "changes what is valid code" without any materialization:

entities.Books.AsEnumerable().Where(b => IsGoodBook(b.title)).Count()

Works too while materialization still will happen when Count is called. It seems that you have a bit mixed up two concepts - laziness/deferred execution and IEnumerable vs IQueryable (see What is the difference between IQueryable and IEnumerable?). Both IEnumerable and IQueryable are lazy, but they work quite different internally.

I would have expected b.title to be of some sort of ProxyString type in the first case

Making b.title to be of type ProxyString will not make IsGoodBook accepting it translatable into SQL (though potentially it will reduce/make easier to find such things). And it will complicate reuse of the existing functions which are translatable (i.e. you will need to have extra overloads with corresponding types and/or some compiler/intellisense support here). And I would argue that it will make compiler implementation considerably harder.

that were finally put to rest by making string? a different type than string.

Which actually not fully correct. string? is not actually a different type from string, it is just some metadata (you can consider approach taken here some kind of type-erasure) which can be analyzed by compiler but not fully prevents NREs (for example json deserializaton allows you to assign null to string property).

I further would have expected something like .Materialize<T>(Func<T>) and friends so that you could call .Materialize(m => m.Count), .Materialize(m => m.ToDictionary(...))

That's a wrong signature to be used with such passed parameters.

Why don't lazy LINQ Enumerables have a different contract than materialized ones?

2 Answers2