3

Before anyone jumps on a mark as duplicate, I have looked and everyone is doing something slightly more complicated than I am trying.

So I'm working on a database where there's a lot of data to check and LINQ's Any() extension translated to SQL isn't as fast as SQL's Count(1) > 0, so everywhere I'm writing:

var someBool = Ctx.SomeEntities.Count(x => x.RelatedEntity.Count(y => y.SomeProperty == SomeValue) > 0) > 0;

In Pseudo: Does any of my entities have a relationship with some other entity that has a property with a value of SomeValue.

This works fine and it works fast. However, it's not exactly readable (and I have lots of them, more embedded than that in cases) so what I'd like to do is replace it with:

var someBool = Ctx.SomeEntities.AnyX(x => x.RelatedEntity.AnyX(y => y.SomeProperty == SomeValue));

with:

public static bool AnyX<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) => source.Count(predicate) > 0;

So you see I'm not doing anything that LINQ can't translate to SQL, I'm not doing anything that LINQ doesn't already translate to SQL, but just by creating an additional extension I get:

LINQ to Entities does not recognize the method Boolean AnyX etc...

There must be some way of writing my extension or some way of telling LINQ just to take a look at the code and you'll see you can do it.

Tod
  • 2,070
  • 21
  • 27
  • 1
    Why are you using AnyX instead of just Any? – jdweng Mar 25 '19 at 12:06
  • 1
    I see 2 big problems, first EF deals with `Expressions`, not delegates; pass an `Expression>` instead of a `Func`, and it needs to extend `IQueryable<>`, not `IEnumerable`. – Bradley Uffner Mar 25 '19 at 12:33
  • @BradleyUffner Yeah I tried all that, still didn't work. What you see there is a copy of the Any() extension definition. – Tod Mar 25 '19 at 14:01
  • @jdweng As I said in the question Any() isn't as fast as Count() > 1 and yes it makes a difference. I can't use Any for my own extension as it'll cause ambiguity. – Tod Mar 25 '19 at 14:03
  • 1
    @Tod You copied the code for the IEnumerable version of `Any`. The IQueryable version is different. [IEnumerable](https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,6a1af7c3d17845e3) Version, [IQueryable](https://referencesource.microsoft.com/#System.Core/System/Linq/IQueryable.cs,0d55b633eacb158e) version – Bradley Uffner Mar 25 '19 at 16:13
  • 1
    It is critically important that you understand the difference between extending `IEnumerable` and `IQueryable` when trying to "enhance" entity framework in this way. If you are operating on an `IEnumerable`, it will run on the local computer, it will NOT be translated in to SQL. – Bradley Uffner Mar 25 '19 at 16:16
  • 1
    See https://stackoverflow.com/a/36736907/1625737 – haim770 Mar 25 '19 at 16:25
  • 1
    If it was so simple, everyone won't be *"doing something slightly more complicated than I am trying."*. LINQ to Entities query translation is based on a set of known methods and is not extendable (at least not in a simple way - you can find a lot of 3rd party libraries trying to address it with custom queryable implementations and expression tree transforming visitors). LINQ is not C# compiler, it can't "see" your custom method implementation. – Ivan Stoev Mar 25 '19 at 18:04
  • Look into [LINQKit](https://github.com/scottksmith95/LINQKit). – NetMage Mar 25 '19 at 19:08
  • Note that `COUNT() > 0` and `EXISTS` (LINQ translation for `Any`) should be the same in MS SQL 2005 or greater - perhaps you should ask why the SQL for `Any` is slow? – NetMage Mar 25 '19 at 19:12
  • PS Your `someBool` statement has an extra `)` and should be slower than `Any`. – NetMage Mar 25 '19 at 19:18
  • @NetMage Checkout this previous conversation: https://stackoverflow.com/questions/648795/what-is-the-fastest-way-to-determine-if-a-row-exists-using-linq-to-sql/48869497#48869497 – Tod Mar 26 '19 at 09:05
  • @BradleyUffner I tried the ICollection, IQueyiable, in fact those extensions are still sat there. I still have the same problem. – Tod Mar 26 '19 at 09:10
  • @Tod That is interesting, but I wonder if the `LIKE` is influencing the result - my tests of `Count` versus `Any` for `==` tests on a related table or a straight table, `Any` is always faster. Consider [this article](http://sqlblog.com/blogs/andrew_kelly/archive/2007/12/15/exists-vs-count-the-battle-never-ends.aspx). – NetMage Mar 26 '19 at 17:15
  • Why do you think `Count` is faster than `Any` for the queries you are doing? – NetMage Mar 26 '19 at 17:26

1 Answers1

0

Not an answer to your specific question, but I suggest you rethink how you're approaching the query.

Let's use some descriptive names that make it easier to understand: do any households have a resident with the first name of "Bobby"?

// your way
Ctx.Households.Count( hh => hh.Residents.Count( r => r.FirstName == "Bobby" ) > 0 ) > 0

Yuck, it's backwards. Start with residents:

Ctx.Residents.Count( r => 
    r.FirstName == "Bobby"
    && r.Household != null ) // if needed
    > 0;

Now, will that generate SQL significantly different than the below?

Ctx.Residents.Any( r => r.FirstName == "Bobby" && r.Household != null)

edit:

Here's a true MCVE that results in the opposite of your conclusion:

/*
create table TestDatum
(
    TestValue nchar(36) not null
)
*/

/*
set nocount on
declare @count int
declare @start datetime
declare @end datetime
set @count = 0

set @start = GETDATE()

while @count < 14000000
begin
    insert TestDatum values( CONVERT(nchar(36), NEWID()) )

    set @count = @count + 1

    if (@count % 100000) = 0
    begin
        print convert(nvarchar, @count)
    end
end

set @end = GETDATE()

select CONVERT(nvarchar, DATEDIFF(ms, @start, @end))
*/

/*
-- "Any" test
declare @startdt datetime, @enddt datetime
set @startdt = GETDATE()
DECLARE @p0 NVarChar(1000) = '%abcdef%'

SELECT 
    (CASE 
        WHEN EXISTS(
            SELECT NULL AS [EMPTY]
            FROM TestDatum AS [t0]
            WHERE [t0].TestValue LIKE @p0
            ) THEN 1
        ELSE 0
     END) AS [value]
set @enddt = GETDATE()
select DATEDIFF(ms, @startdt, @enddt) -- ~7000ms
*/
/*
-- "Count" test
declare @startdt datetime, @enddt datetime
set @startdt = GETDATE()
-- Region Parameters
DECLARE @p0 NVarChar(1000) = '%abcdef%'
-- EndRegion
SELECT COUNT(*) AS [value]
FROM TestDatum AS [t0]
WHERE [t0].TestValue LIKE @p0
set @enddt = GETDATE()
select DATEDIFF(ms, @startdt, @enddt) -- > 48000ms
*/
Moho
  • 15,457
  • 1
  • 30
  • 31
  • Yes SQL Translates any to: CASE WHEN EXISTS( SELECT NULL AS [EMPTY] FROM. Count is just COUNT – Tod Apr 04 '19 at 12:46
  • @Tod - ok, and you're saying counting all records is somehow faster or more optimal than short circuiting upon finding the first record? – Moho Apr 04 '19 at 14:05
  • Yes, that's exactly what I'm saying. I don't know why, I've posted on SO before, check the comments from the question for the link. Some data can be comparable in terms of the speed of the two methods. Other times it can be much much quicker to Count. – Tod Apr 04 '19 at 15:05
  • Can you provide a [MCVE](https://stackoverflow.com/help/mcve) reproducing this behavior? – Moho Apr 04 '19 at 15:14
  • Like I said, link in the comments for the question. https://stackoverflow.com/questions/648795/what-is-the-fastest-way-to-determine-if-a-row-exists-using-linq-to-sql/48869497#48869497 – Tod Apr 04 '19 at 15:17
  • My results from the above edit demonstrated the complete opposite conclusion (as described in my comment on your post) - ~7000ms for the `any` method and over 48000ms for the `count` method. – Moho Apr 04 '19 at 18:14