0

Using SQL Server 2012, what is the difference between using UNION and UNION of two queries that use DISTINCT? ie

SELECT Columns FROM A
UNION
SELECT Columns FROM B

versus

SELECT DISTINCT Columns FROM A
UNION
SELECT DISTINCT Columns FROM B

What is the difference between above queries?

Bohemian
  • 412,405
  • 93
  • 575
  • 722
Vikrant More
  • 5,182
  • 23
  • 58
  • 90
  • Have you noticed a difference? The latter removes duplicates twice (if it's not optimized). – Tim Schmelter Oct 20 '14 at 06:45
  • Distinct removes the duplicates from the single result set however when you use UNION to add two different result sets, It will remove duplicates from the final result set. There may be some performance issues. – Simranjeet Singh Oct 20 '14 at 06:48
  • @TimSchmelter actually i didn't see any difference practically, But i have function in which for select using DISTINCT and having join with functions.I am looking for in terms of Performance and tuning point of view. – Vikrant More Oct 20 '14 at 06:48
  • @SimranjeetSingh you are right what i am worrying is Distinct uses Order by clause which uses tempdb for sorting data that could cause performance degradation. – Vikrant More Oct 20 '14 at 06:50
  • 1
    @VikrantMore the only way to find this is to execute the queries and then study their execution plans. – Simranjeet Singh Oct 20 '14 at 06:54

2 Answers2

1

There is no difference in the result, because UNION eliminates duplicates.

FYI UNION ALL preserves all rows in the order selected (like a simple append).

Adding DISTINCT to each (or either) side of the UNION would probably be slower, because it introduces a pointless step (although any optimizer worth its salt would recognise this and eliminate the duplicated work).

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • so can i conclude using DISTINCT WITH UNION can cause for performance degradation.Do we have any link where i can deep dive in this topic. – Vikrant More Oct 20 '14 at 06:52
  • 1
    I would just try it and see. That's the fastest way to get a solid answer on performance. Using distinct may be faster depending on what you're doing, for example if an index-only scan is used you may find distinct can eliminate lots of duplicates more quickly than via the union. It depends on table size, result rowset size, where clause and indexes present. Like I said - try it on your real tables with real data and a real where clause. – Bohemian Oct 20 '14 at 06:58
  • `UNION ALL preserves all rows in the order selected (like a simple append)` – is this documented or just your observation? – Andriy M Oct 20 '14 at 07:18
  • @AndriyM it's the SQL standard AFAIK, but yes this is my observation on every database I have ever worked on, and that's a lot – Bohemian Oct 20 '14 at 07:24
  • http://stackoverflow.com/questions/15766359/does-union-all-guarantee-the-order-of-the-result-set – Andriy M Oct 21 '14 at 13:53
0

I completely agree to what Bohemian say!!. There is no difference between them.But if u want to select the duplicate records also you have to change the query by replacing union with union all.

Here is the proof :- "What is the difference between UNION and UNION ALL?"

Community
  • 1
  • 1
Sudhir kumar
  • 549
  • 2
  • 8
  • 31