0

The database this is being queried against contains a few million records - just wondering if the following query can be optimized at all

SELECT Count(DISTINCT [t0].[gUserId]) AS [UserId]
FROM [Plan] AS [t0]
WHERE EXISTS(
    SELECT NULL AS [EMPTY]
    FROM [Entry] AS [t1]
    WHERE ([t1].[PlanId] = ([t0].[id])) 
    AND ([t1].[EntryMobile] = 1) AND (NOT ([t0].[Deleted] = 1)))
amun1000
  • 453
  • 1
  • 8
  • 18
  • 1
    Why are you asking, is it running poorly? Are the columns you have listed all indexed correctly? Have you checked the execution plan for bottlenecks? Need more info. – Dave C Jun 05 '15 at 13:53
  • WHERE part can be optimized using indices. If you create covering index (all fields included - planid, entrymobile) on t1, then next worst part will be COUNT(DISTINCT). Sure you need covering index (id, deleted, guserid) on t0 too. – Arvo Jun 05 '15 at 13:54
  • 3
    @HellBaby I disagree with the above statement about joins being 4K times faster than an exists statement... I'd love to see some proof of that https://stackoverflow.com/questions/7082449/exists-vs-join-and-use-of-exists-clause – Dave C Jun 05 '15 at 13:55
  • NOT EXISTS is the probably the best SQL for this. Otherwise speed should be mainly based on indexes. There's just one thing I would try: move the `AND (NOT ([t0].[Deleted] = 1))` condition out of the EXISTS, the optimizer should be smart enough to do this, when you check the plan it should be the same, but one never knows.... – dnoeth Jun 05 '15 at 14:59
  • Going through the execution plan did reveal a join is faster in this case – amun1000 Jun 05 '15 at 15:18
  • @HellBaby I don't wish to get into a debate here, but you should use a `JOIN` to compare to a `WHERE EXISTS`, not a `LEFT JOIN`. And for comparison, I did a test between two tables with 30M+ rows each, and the run times were identical in my case, as were the execution plans. As stated in the link I posted, this is expected and `EXISTS` shines on more complex queries. You must also consider plan caching, server load etc, when doing comparisons. There are exceptions to every rule, but blanket statements that `JOINS` are "4k times faster" than `EXISTS` are bad seeds to sew. – Dave C Jun 05 '15 at 16:12

1 Answers1

0

Here are another two queries that output the same as yours:

1.

SELECT Count(DISTINCT [t0].[gUserId]) AS [UserId]
    FROM [Plan] AS [t0]
        CROSS APPLY (SELECT TOP 1 1 X 
             FROM [Entry] AS [t1]
             WHERE [t1].[PlanId] = [t0].[id]
               AND [t1].[EntryMobile] = 1) t1
    WHERE [t0].[Deleted] <> 1

2.

SELECT Count(DISTINCT [t0].[gUserId]) AS [UserId]
    FROM [Plan] AS [t0]
    WHERE [t0].[Deleted] <> 1
        AND [t0].[id] IN (SELECT [PlanId]
             FROM [Entry] 
             WHERE [EntryMobile] = 1)

I suspect #2 is the fastest of them all, but not by much.

To test, run them all together in a single SQL batch from SQL Server Management Studio, with the "Include Actual Execution Plan" option set, then compare execution plans to see which one is optimal. Sometimes it will tell you if an index is missing.

Diego
  • 18,035
  • 5
  • 62
  • 66