2

Given a property foo of an entity bar for which the entity set contains a large number of values, how can we check for the absence (from bar) of any value in an arbitrary list of values, using LINQ-to-Entities?

For example:

  • bar contains ALPHA and BETA.
  • We have a list of three items: [ALPHA, BETA, GAMMA] which we want to send to the EF backend (DBMS). We expect the backend to reply with a single scalar result (1 or 0 / true or false). In this case, it would return 0, since at least one value is absent (GAMMA).
  • If all values from the list are in the entity set, the query would return 1. For example, when the list is: [ALPHA, BETA] or simply [ALPHA] or [BETA].
  • We definitely don't want to transfer the whole entity set to the EF client.
  • We definitely don't want to issue multiple queries for each value in the list.

Example dynamic T-SQL:

select case when not exists (
    select * from (
        select 'ALPHA' foo
        union select 'BETA' foo
        union select 'GAMMA' foo) a
    left join (select distinct foo from bar) b
    on a.foo = b.foo
    where b.foo is null)
    then 1
    else 0
end;

How could we formulate a LINQ-to-Entity query that would generate this SQL query (or equivalent)?

Note: I wrote this SQL query in the most natural way I could think of (it literally reads like "check that there is no value in the list for which there is no corresponding value in the bar table"). It's very likely there's a better way to do this.

tne
  • 7,071
  • 2
  • 45
  • 68
  • Reminds me of http://stackoverflow.com/questions/30984564/contains-at-least-all-in-entity-framework – jjj Jul 08 '15 at 00:35
  • @jjj Indeed, it's almost a dupe if not for the "return a boolean" requirement. It brings an interesting problem regarding the context of LINQ operations (L2E vs L2O) though. – tne Jul 08 '15 at 14:59

3 Answers3

2

You are looking for the All method

This will give you the correct answer, but it will result in a query for each item in list.

var list = new[] { "ALPHA", "BETA", "GAMMA" };

bool result = list.All(f => ctx.Bars.Any(b => b.Foo == f));

However, if you write it as a subselect, it will generate a single query.

bool result = 
    ctx.Bars.Select(x => 

              list.All(f => ctx.Bars.Any(b => b.Foo == f)) //Same as above but encapsulated 

            ).FirstOrDefault();

Notice that ctx.Bars.Select(x => is never actually used and it's only purpose is to encapsulate your All logic.


Another option is to use SelectMany on the context and then selecting only the list. The query sytnax looks like this:

bool result = 
    (from b in ctx.Bars
     from l in list
     select l).All(f => ctx.Bars.Any(b => b.Foo == f));

So like the solution above, we are only using from b in ctx.Bars to force the query to be an IQueryable expression rather than an IEnumerable. Which will generate 1 query, rather than N queries

Aducci
  • 26,101
  • 8
  • 63
  • 67
  • That's not quite right. That returns true if and only if every `bar` in the table has `foo` set to one of the listed values. The OP wants to determine whether all of the listed values have at least one corresponding `bar` in the table. – Richard Deeming Jul 07 '15 at 19:50
  • Thank you for your answer. I *believe* my question is precise, but I definitely should have added the following example: list [A, B], bar set [A, B, D]. Unless I'm reading this wrong your answer would fail even though all items of the *list* are in the bar set. Now that I'm thinking about it in terms of `All`, inverting things would seem to procure the right semantics: `list.All(f => ctx.Bars.Any(b => b.Foo == f))`. Yet, I fear it wouldn't generate the right SQL: I expect it'll generate multiple queries. I'll try and report tomorrow. – tne Jul 07 '15 at 19:53
  • -- Oh, you beat me to my comment. – tne Jul 07 '15 at 19:55
  • @tne sorry I misread the question. I updated the answer, let me know if you have any questions. – Aducci Jul 07 '15 at 20:21
  • That's certainly creative. I'm not sure I like it *too* much yet, but I'll come around; type-safety benefits outweigh the POLA violation. A close-ish variant with Except+Any inspired by @jjj's post compiles a little cleaner: `ctx.Bars.Select(_ => list.Except(ctx.Bars.Select(b => b.Foo)).Any())` and intuitively *seems* like it could be faster. Yet, the query planner still *slightly* favors your version by 2%. It'd be really great if we could avoid the Select+First trick somehow, unless you know of an existing enhancement request, I might write one. – tne Jul 08 '15 at 11:20
  • I commend your trickery skills. Your last edit, while still looking odd, actually generates a seemingly optimal request (almost identical to the one suggested by @RichardDeeming). We'll continue to research a few options for a bit, but it's very likely we'll end up using that. Thanks again! – tne Jul 08 '15 at 16:26
  • @tne - It all boils down to using an `IQueryable` first and then the list (`IEnumerable`). So I guess the choice depends on what you prefer - readability or optimized sql generation. – Aducci Jul 08 '15 at 16:49
1

Something like this?

bool result = !context.Bars.Any(b => list.Contains(b.Foo));

EDIT: The only option I can see is to use a raw SQL query. This will limit which back-end DBMS you can use, since different database systems support different syntax.

You'll also need to hard-code the table and column names - they won't get picked up if you refactor your code.

Assuming Microsoft SQL Server, something like this should work:

var list = new List<string> { "A", "B", "C" };

// Generate a parameter placeholder for each item in the list:
var listParameters = list.Select((item, index) => "@p" + index);

// Generate a "UNION" query for the parameters:
string listValues = string.Join(" UNION SELECT ", listParameters);

// The final query to execute:
string query = @"SELECT CASE WHEN NOT EXISTS
(
    SELECT 1
    FROM (SELECT " + listValues + @") As L (foo)
    WHERE NOT EXISTS
    (
        SELECT 1
        FROM bar As b
        WHERE b.foo = L.foo
    )
) THEN CAST(1 As bit) ELSE CAST(0 As bit) END";

bool result = context.Database.SqlQuery<bool>(query, list).First();
Richard Deeming
  • 29,830
  • 10
  • 79
  • 151
  • Thank you for your attempt. For some reason this is the first thing I tried too (maybe it *sounds* right with `Any` in there) but quickly you realize it doesn't do what's intended: considering the [A, B, C] list with the [A, B] bar set, `Any` will stop at A since it's contained in the list, then invert the result without even considering C. In this case it yields the right answer by chance, but now consider: [A, B] list and [A, B] bar set. This should yield true, but doesn't. Unless I'm really tired moving the negation to the predicate isn't right either. – tne Jul 07 '15 at 18:31
  • @tne: OK, I can see the problem. I can't see any way to fix it with EF directly - the only options I can come up with involve either making one query per item in the local list, or pulling back all of the `foo` values into memory. You might have to drop to raw SQL for this one. – Richard Deeming Jul 07 '15 at 19:01
  • On the one hand, I feared that was the case, on the other, I thought I was losing my mojo (it *sounds* so simple) so that's reassuring. I'll leave this open for a bit just in case we're both having a bad day, but feel free to add another "no can do" answer if you want. – tne Jul 07 '15 at 19:10
  • @tne: OK, it's not clean, but the updated answer should do what you need. – Richard Deeming Jul 07 '15 at 19:47
  • For now your answer is technically the most exact one, despite its obvious disadvantages. Aducci's requires a tricky wrapping construct that will be reflected in the compiled SQL, and jjj's requires the backend to return a count instead of a boolean. I also like the double `EXISTS` better than a outer join. – tne Jul 08 '15 at 11:34
1

I'm borrowing an answer from here but maybe you could do something like:

var itemList = new List<string> { ... };

// select just foo
var result = context.Bars.Select(b => b.Foo)
   // get the distinct values of foo
   .Distinct()
   // count how many values of foo are in itemList and check if the counts are equal
   // because the query is distinct, the count would match iff all items were matched
   .Count(f => itemList.Contains(f)) == itemList.Count;

The question for the answer I linked does also include other options that could be adapted for your case, like using Except with Any.

Community
  • 1
  • 1
jjj
  • 4,822
  • 1
  • 16
  • 39
  • Thanks for that link, this is the entry point I missed while researching. While your solution doesn't *technically* answer the exact question (returns a count instead of a boolean value), for all intents and purposes it will work great with the appropriate performance signature. For kicks I tried the Except+Any approach coupled with Aducci's trick to have it run in the backend, it seems to work fine. I've read your discussions with @GertArnold regarding Except where he mentions issues with nesting levels; yet the generated query *looks* good to me. Are UNIONs considered to nest queries maybe? – tne Jul 08 '15 at 10:28
  • Additional info: The distinct is actually compiled to a distinct-sort with a T-SQL backend, which has a bit of a cost compared to other approaches (e.g. @Aducci's). Unless there's something I missed I'm tempted to go with Except+Any or All(Any), even though they supposedly require that weird trick to be handled in the context of L2E. – tne Jul 08 '15 at 11:29
  • @tne: I think it's also possible that the SQL generation for `Except` has been optimized since he had last looked at it – jjj Jul 08 '15 at 17:14