Compare (estimated) query/solution perfomance. Return differences between two tables (heaps)

Question

Two queries provide the same result.

How to define the query with better performance:

a. before execution

b. after execution

?

Example 1: following queries return difference between two heaps

(   SELECT * FROM TABLE1    EXCEPT
    SELECT * FROM TABLE2
)   
UNION ALL
(   SELECT * FROM TABLE2    EXCEPT
    SELECT * FROM TABLE1
) 
;

2nd approach (and 3rd with UNION ALL):

(   SELECT * FROM TABLE1    UNION --ALL --?!
    SELECT * FROM TABLE2
)   
EXCEPT
(   SELECT * FROM TABLE2    INTERSECT
    SELECT * FROM TABLE1
) 
;

Which approach is better?

Or it depends (Advantages/Disadvantages)?

Example 2: Almost the same, but in addition returns source table column

1st approach:

SELECT 'TABLE1-ONLY' AS SRC, T1.*
FROM (
      SELECT * from TABLE1
      EXCEPT
      SELECT * from TABLE2
      ) AS T1
UNION ALL
SELECT 'TABLE2-ONLY' AS SRC, T2.*
FROM (
      SELECT * from TABLE2
      EXCEPT
      SELECT * from TABLE1
      ) AS T2
;

2nd approach (and 3rd with UNION ALL):

(   SELECT SRC='TABLE1-ONLY', * FROM TABLE1    UNION --ALL --?!
    SELECT SRC='TABLE2-ONLY', * FROM TABLE2
)   
EXCEPT
(   SELECT * FROM (select SRC='TABLE1-ONLY' UNION ALL 
                   select SRC='TABLE2-ONLY'
                  ) s 
                 ,(SELECT * FROM TABLE2     INTERSECT
                   SELECT * FROM TABLE1
                  ) i
);

Which approach is better?

Or it depends (Advantages/Disadvantages)?

NB! Following questions are out of this topic:

How to define Primary Key for the table
How to identify duplicates in the table

P.S.

The PRIMARY question is about performance comparison.
Solution improvement is secondary.
set statistics io on; are almost the same.
set statistics time on; could provide different results after each run.
Don't see a big difference in execution plans (but not professional in this field).
I have a feeling of queries performance difference, but not knowledge. If someone has knowledge please share. ;)
Could you please specify scenarios when one solution is definitely better than the other one and vice versa?
Regarding 2nd vs 3rd approach: where duplicate removal works faster - within UNION or within EXCEPT/INTERSECT? And if duplicates are removed during UNION, will EXCEPT/INTERSECT search for it again?

What is "best" is entirely dependant on your instance, resources, DDL of your tables and indexes, size of the objects, etc, etc. What you have here is appears to also be far too broad for Stack Overflow. — Thom A, Sep 03 '19 at 10:24
:) Imagine, you have to provide some query to a group of accountants with very basic SQL knowledge for two temporary tables comparision - no indexes, no PK. This query should substitute tonns of clicks in Excel. — Denis, Sep 03 '19 at 14:30

score 1 · Accepted Answer · answered Sep 03 '19 at 10:48

1

You should test on your data.

But absent other information, I would expect the first to be better. Each is doing three set operations on the entire table. In the first, the "intermediate" tables are smaller, so I would expect better performance.

That is, UNION/UNION ALL is going to create a bigger table (in the second solution) which then needs to be processed.

Note that in SQL Server, EXCEPT and INTERSECT also remove duplicates, which might add additional overhead.

answered Sep 03 '19 at 10:48

Gordon Linoff

1,242,037
58
646
786

My feeling is the same - "1st" is better. Regarding "You should test on your data": My experience: 1st approach in my particular test case was a little bit faster, but with higher CPU time (I can talk only about averages). Regarding "UNION/UNION ALL": The question is where duplicate removal works faster - within UNION or within EXCEPT/INTERSECT? Or no difference? – Denis Sep 03 '19 at 11:09
@Denis . . . You have no choice in SQL Server with duplicate removal using `EXCEPT`/`INTERSECT`. SQL Server does not support non-unique versions of these operators (i.e. no `EXCEPT ALL`). – Gordon Linoff Sep 03 '19 at 12:06
2nd vs 3rd approach - From result point of view there is no difference. On which step is better to remove duplicates - within `union` or within `EXCEPT/INTERSECT`? – Denis Sep 03 '19 at 14:19

Compare (estimated) query/solution perfomance. Return differences between two tables (heaps)

1 Answers1