4

I am using EF6 and I would like to get the records in a table which are in a group of IDs.

In my test for example I am using 4 IDs.

I try two options, the first is with any.

dbContext.MyTable
.Where(x => myIDS.Any(y=> y == x.MyID));

And the T-SQL that this linq exrepsion generates is:

SELECT 
    *
    FROM [dbo].[MiTabla] AS [Extent1]
    WHERE  EXISTS (SELECT 
        1 AS [C1]
        FROM  (SELECT 
            [UnionAll2].[C1] AS [C1]
            FROM  (SELECT 
                [UnionAll1].[C1] AS [C1]
                FROM  (SELECT 
                    cast(130 as bigint) AS [C1]
                    FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
                UNION ALL
                    SELECT 
                    cast(139 as bigint) AS [C1]
                    FROM  ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
            UNION ALL
                SELECT 
                cast(140 as bigint) AS [C1]
                FROM  ( SELECT 1 AS X ) AS [SingleRowTable3]) AS [UnionAll2]
        UNION ALL
            SELECT 
            cast(141 as bigint) AS [C1]
            FROM  ( SELECT 1 AS X ) AS [SingleRowTable4]) AS [UnionAll3]
        WHERE [UnionAll3].[C1] = [Extent1].[MiID]
    )

How can is seen, the T-SQL is a "where exists" that use many subqueries and unions.

The second option is with contains.

dbContext.MyTable
.Where(x => myIDS.Contains(x.MiID));

And the T-SQL:

SELECT 
    *
    FROM [dbo].[MiTabla] AS [Extent1]
    WHERE [Extent1].[MiID] IN (cast(130 as bigint), cast(139 as bigint), cast(140 as bigint), cast(141 as bigint))

The contains is translated into "where in", but the query is much less complex.

I have read that any it use to be faster, so I have the doubt if the any is, although it is more complex at a first glance, is faster or not.

Thank so much.

EDIT: I have some test (I don't know if this is the best way to test this).

System.Diagnostics.Stopwatch miswContains = new System.Diagnostics.Stopwatch();
                        miswContains.Start();
                        for (int i = 0; i < 100; i++)
                        {
                            IQueryable<MyTable> iq = dbContext.MyTable
                            .Where(x => myIDS.Contains(x.MyID));

                            iq.ToArrayAsync();
                        }
                        miswContains.Stop();



                        System.Diagnostics.Stopwatch miswAny = new System.Diagnostics.Stopwatch();
                        miswAny.Start();
                        for (int i = 0; i < 20; i++)
                        {
                            IQueryable<MyTable> iq = dbContext.Mytable
                            .Where(x => myIDS.Any(y => y == x.MyID));

                            iq.ToArrayAsync();
                        }
                        miswAny.Stop();

the results are that miswAny is about 850ms and the miswContains is about 4251ms.

So the second option, with contaions, is slower.

Álvaro García
  • 18,114
  • 30
  • 102
  • 193
  • `Entity Framework` generated the code with `select *`? Are the IDs of type long? How do you know if the result are not cached, did you try to swap the queries order ? – Ofiris Feb 08 '15 at 20:10

3 Answers3

2

Your second option is the fastest solution I can think of (at least for not very large arrays of ids) provided your MiTabla.MiID is in an index.

If you want to read more about in clause performance: Is SQL IN bad for performance?.

Community
  • 1
  • 1
Joanvo
  • 5,677
  • 2
  • 25
  • 35
  • I have edit the original post to add a test. But I don't know if it is the best way to test this cases. – Álvaro García Feb 08 '15 at 20:01
  • 1
    No, it isn't, because you are not taking into account some external factors, like SQL cache or net latency, although it may give you a rough estimation. Anyway, you are iterating 20 times to do the second test and 100 to do the first one (and the results show it's 5 times faster, probably due to that). Also, you should probably use await the instruction `iq.ToArrayAsync()` – Joanvo Feb 09 '15 at 13:51
1

If you know the ID, then using LINQ2SQL Count() method would create a much cleaner and faster SQL code (than both Any and Contains):

dbContext.MyTable
.Where(x => myIDS.Count(y=> y == x.MyID) > 0);

The generated SQL for the count should look something like this:

DECLARE @p0 Decimal(9,0) = 12345
SELECT COUNT(*) AS [value]
FROM [ids] AS [t0]
WHERE [t0].[id] = @p0
keenthinker
  • 7,645
  • 2
  • 35
  • 45
1

You can tell by the shape of the queries that Any is not scalable at all. It doesn't take many elements in myIDS (~50 probably) to get a SQL exception that the maximum nesting level has exceeded.

Contains is much better in this respect. It can handle a couple of thousands of elements before its performance gets severely affected.

So I would go for the scalable solution, even though Any may be faster with small numbers. It is possible to make Contains even better scalable.

I have read that any it use to be faster,

In LINQ-to-objects that's generally true, because the enumeration stops at the first hit. But with LINQ against a SQL backend, the generated SQL is what counts.

Community
  • 1
  • 1
Gert Arnold
  • 105,341
  • 31
  • 202
  • 291