13

I have to select some rows based on a not exists condition on a table. If I use a union all as below, it gets executed in less than 1 second.

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(

SELECT 1 FROM TABLE t
WHERE Data1 = t.Col1 AND Data2=t.Col2

UNION ALL

SELECT 1 FROM TABLE t
WHERE Data1 = t.Col2 AND Data2=t.Col1

)

but if I use an OR condition, it takes close to a minute as SQL server is doing a table lazy pool. Can someone explain it?

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(

SELECT 1 FROM TABLE t
WHERE ( (Data1 = t.Col1 AND Data2=t.Col2) OR (Data1 = t.Col2 AND Data2=t.Col1))
)
Chandu
  • 81,493
  • 19
  • 133
  • 134
coder net
  • 3,447
  • 5
  • 31
  • 40

3 Answers3

6

The issue is that you are specifying two conditions with OR that apply to separate tables in your query. Because of this, the nonclustered index seek has to return most or all of the rows in your big table because OR logic means they might also match the condition clause in the second table.

Look at the SQL execution plan in all three examples above, and notice the number of rows that come out of the nonclustered index seek from the big table. The ultimate result may only return 1,000 or fewer of the 800,000 rows in the table but the OR clause means that the contents of that table have to be cross-referenced with the conditional in the second table since OR means they may be needed for the final query output.

Depending on your execution plan, the index seek may pull out all 800,000 rows in big table because they may also match the conditions of the OR clause in the second table. The UNION ALL is two separate query against one table each, so the index seek only has to output the smaller result set that might match the condition for that query.

I hope this makes sense. I've run across the same situation while refactoring slow-running SQL statements.

Cheers,

Andre Ranieri

user1472721
  • 191
  • 1
  • 2
3

The query plan is also affected by the number of rows in your tables. How many rows are there in table t ?

You could also try:

SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col1 AND Data2=t.Col2
)
AND NOT EXISTS 
(    
  SELECT 1 FROM TABLE t
  WHERE Data1 = t.Col2 AND Data2=t.Col1    
)

or (corrected for SQL-Server) this that will use the index:

WITH tt AS                               <---- a temp table with 2 rows
( SELECT Data1 AS Col1, Data2 AS Col2
  UNION
  SELECT Data2 AS Col1, Data1 AS Col2
)
SELECT 1 FROM dummyTable
WHERE NOT EXISTS
(
  SELECT 1
  FROM TABLE t
    JOIN tt                      
      ON tt.Col1 = t.Col1 AND tt.Col2=t.Col2
)
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • The table has like 800k rows in it. I like the options you have provided. Currently i'm using the union all, but I could use the AND NOT EXISTS option as well. I was mainly looking to see why "OR" would have a problem. I guess it is not using any indexes and has to store the values in a temp table for comparison. – coder net Apr 13 '11 at 15:28
  • I don't expect my `AND NOT EXISTS` solution to be any better that the `UNION`. (by the way, why do you use `UNION ALL` and not `UNION` ? – ypercubeᵀᴹ Apr 13 '11 at 15:40
  • I think union all is faster as it does not do any grouping/filtering. In my case, it is going to be mutually exclusive condition so I can do a union all. And yes, I don't think AND not exists will perform better than Union All, just reads better. – coder net Apr 13 '11 at 19:14
  • I have not. CTE is usually not that great performance wise anyway. Also, union all gives me what I want, was just confused why the OR was having issues. – coder net Apr 13 '11 at 19:25
  • From the above explanation of the answer I cannot figure out the "Why Union is working faster than OR" which is the one of the main point of concern of the question. It tells, okay, you can use index to improve execution time/optimize sql query but from my view it does not answer the question. There are some other relevant questions and their answers which could help : https://stackoverflow.com/questions/13750475/sql-performance-union-vs-or https://stackoverflow.com/questions/25520758/why-using-or-condition-instead-of-union-caused-a-performance-issue – Deep Jul 24 '19 at 12:00
1

The usage of the OR is probably causing the query optimizer to no longer use an index in the second query. Look at the explain for each query and that will tell you the answer.

Wes
  • 6,455
  • 3
  • 22
  • 26
  • You could be right. It is a recursive kind of query and the execution plan is complicated. It is not easy to tell anything from it. Most of the plan looks identical for both queries. I suppose I'll just go ahead and use the UNION ALL. – coder net Apr 12 '11 at 18:31