6

Suppose, I have a list of string which is sent from client, in this case regions. And I have a table named Compliance Regions. I want to search for the rows which has a name property and region item should be exist on the row.

In the LINQ, I can do it in two different ways. Like below, and they produce two different SQL Queries. My question is which should I choose? Which query has the better performance?

 List<string> regions = new List<string>() { "Canada", "EN 50530" };

           var cregions = from c in complianceRegions
                           from r in regions
                           where c.Name.Equals(r)
                           select c;

            var cregions2 = from c in complianceRegions
                            where regions.Any(x => x == c.Name)
                            select c;

The generated sql is showed below.

      -- cregions
   SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[Description] AS [Description]
    FROM  [Administration].[ComplianceRegions] AS [Extent1]
    INNER JOIN  (SELECT 
        N'Canada' AS [C1]
        FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
    UNION ALL
        SELECT 
        N'EN 50530' AS [C1]
        FROM  ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1] ON [Extent1].[Name] = [UnionAll1].[C1]  

cregions

And

--cregions2
    SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[Description] AS [Description]
    FROM [Administration].[ComplianceRegions] AS [Extent1]
    WHERE  EXISTS (SELECT 
        1 AS [C1]
        FROM  (SELECT 
            N'Canada' AS [C1]
            FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]
        UNION ALL
            SELECT 
            N'EN 50530' AS [C1]
            FROM  ( SELECT 1 AS X ) AS [SingleRowTable2]) AS [UnionAll1]
        WHERE [UnionAll1].[C1] = [Extent1].[Name]
    )

cregions2 Added Execution plan as requested.

Foyzul Karim
  • 4,252
  • 5
  • 47
  • 70
  • 2
    *Which query has the better performance?* What did your own tests reveal ? – High Performance Mark Mar 15 '17 at 11:39
  • the interesting part is, when I run the same query several times, SQL server caches the query and give me wrong performance. – Foyzul Karim Mar 15 '17 at 11:56
  • Why the first query uses `Contains` and the second `==`? Which is the intended comparison operator? Currently you are comparing apples to oranges. – Ivan Stoev Mar 15 '17 at 11:59
  • sorry. updating the question with appropriate query. @IvanStoev – Foyzul Karim Mar 15 '17 at 12:10
  • @IvanStoev please check now. thanks. – Foyzul Karim Mar 15 '17 at 12:16
  • 1
    Ok, then your have a 3rd optioin - `from c in complianceRegions where regions.Contains(c.Name) select c`, which should generate the SQL mentioned by @Gordon, and IMO should invalidate your current question because there will be neither `CROSS JOIN` nor `EXISTS` :) – Ivan Stoev Mar 15 '17 at 12:26
  • Possible duplicate of [EXISTS vs JOIN and use of EXISTS clause](http://stackoverflow.com/questions/7082449/exists-vs-join-and-use-of-exists-clause) – Lostblue Mar 15 '17 at 13:34

3 Answers3

2

Given the two alternatives, the second is probably better, because there is no function call on the column name.

However, the two are not the same. The first does a partial match and the second does an exact match. You should really do the one that does what you intend. Performance is secondary to accuracy.

Both are painful look at. Casting the output of CHARINDEX() to an index? Can we say "redundant"?

They beg the question why the query isn't simply:

select . . .
from [Administration].[ComplianceRegions] AS [Extent1]
where Extent1.Name in (N'Canada', N'EN 50530');

This is the simplest and best performing version of this logic.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Quotes of @JNK answer in this post.

EXISTS is used to return a boolean value, JOIN returns a whole other table

EXISTS is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN is used to extend a result set by combining it with additional fields from another table to which there is a relation.

In your example, the queries are symantically equivalent.

In general, use EXISTS when:

You don't need to return data from the related table You have dupes in the related table (JOIN can cause duplicate rows if values are repeated) You want to check existence (use instead of LEFT OUTER JOIN...NULL condition) If you have proper indexes, most of the time the EXISTS will perform identically to the JOIN. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS.

If your JOIN key is not indexed, it may be quicker to use EXISTS but you will need to test for your specific circumstance.

JOIN syntax is easier to read and clearer normally as well.

Community
  • 1
  • 1
Lostblue
  • 419
  • 2
  • 10
0

Those queries are equal. And both are awful, I would not choose any of them. They have hard coded values, so they can not be reused. I am not sure if sql server autoparametrization can deal with that, so execution plan cache would probably suffer. The right solution is using table valued parameters, unfortunately they are not supported in linq providers as far as I know. So You would have to make the query on Your own, You can only use linq to materialize the result.

You can try the solution provided by Ivan Stoev, how good it is depends on what Your provider produces. Linq2sql does not scale well for longer lists, because it produces as many parameters as many items in the list You have. At least can reuse execution plan when number of items in the list is the same.

Antonín Lejsek
  • 6,003
  • 2
  • 16
  • 18