3

In general, I have a SELECT query combined with three independent SELECT queries. I am using EXCEPT and UNION operators in the query. When executing the queries independently, I'll receive the results in 1-2 seconds, but when having EXCEPT operator, the query will take hours.

The query structure (as simplified) is as follows:

SELECT DISTINCT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE

EXCEPT

(
SELECT DISTINCT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE WHERE XXX

UNION

SELECT DISTINCT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE WHERE YYY
)

Are there any ways to speed up the whole query or is the EXCEPT operator in general so slow that it should be avoided?

  • 2
    `WHERE NOT EXISTS`? – jarlh Feb 08 '19 at 14:33
  • Tables have _columns_, not _fields_. – jarlh Feb 08 '19 at 14:33
  • 5
    I'd start [here](https://stackoverflow.com/questions/1662902/when-to-use-except-as-opposed-to-not-exists-in-transact-sql), and [here](https://blog.sqlauthority.com/2016/10/30/interview-question-week-095/), and [here](https://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join). You haven't showed the execution plan, and that's needed... but since you are using the *same* table in all three queries, I'm certain there is a much more elegant solution, like `NOT EXISTS` as @jarlh offered. I can't see why you'd need `UNION` and `EXCEPT` here. Also, the `DISTINCT` isn't needed – S3S Feb 08 '19 at 14:36
  • How about `SELECT DISTINCT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE WHERE NOT XXX AND NOT YYY`? – Peter B Feb 08 '19 at 14:38
  • Are these three queries really all against the same table, or is that just your obfuscation? – Tab Alleman Feb 08 '19 at 14:39
  • The last article you will see Aaron's conclusion that *This was just a really long-winded way of telling you that, for the pattern of finding all rows in table A where some condition does not exist in table B, NOT EXISTS is typically going to be your best choice* – S3S Feb 08 '19 at 14:39
  • Tab: it is the same table. The first query basically gives almost the whole table and second + third query are the subsets which I need to take away from the first query results. – user11034064 Feb 08 '19 at 14:42
  • Oversimplified effectively means "imaginary". No one but you knows precisely what "XXX" and "YYY" mean. Martin has made an important observation to one suggestion. Most likely union (not union all) and distinct usage in the 2nd part of the query do nothing useful and it is best to NOT rely on the optimizer to ignore attempts to remove duplicates to no advantage. The second part of the except can probably be simplified similar to what Peter B suggests. But all of this is just guessing. – SMor Feb 08 '19 at 15:20

4 Answers4

3

You can do this with GROUP BY

SELECT FIELD_1, FIELD_2, FIELD_3 
FROM MYTABLE
GROUP BY FIELD_1, FIELD_2, FIELD_3 
HAVING MAX(CASE WHEN (XXX) OR (YYY) THEN 1 ELSE 0 END) = 0
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • good answer. Important: anyone who uses this (or most of the other answers to this question) should be sure to enclose XXX in its own set of parenthesis if it represents a composite condition. And, of course, the same with YYY. – Richard II Feb 08 '19 at 15:31
  • Indeed, good thinking. It can be that this one also will provide a better performance than my simplification (v3), depends on logic of XXX or YYY – Alexander Volok Feb 08 '19 at 15:42
  • 1
    @RichardII - thanks. I've added them anyway to be on the safe side. – Martin Smith Feb 08 '19 at 15:44
  • @RichardII. . . This will be horrible if filter goes complex logic such (`OR` + `AND`). – Yogesh Sharma Feb 08 '19 at 17:25
  • @Yogesh, what does "This" refer to in your comment? the query itself? my suggested change? And what do you think will be horrible about it: performance? legibility? – Richard II Feb 08 '19 at 20:23
0

Sometimes it is beneficial to make task for Query Optimizer easier by splitting the workload into multiple steps. Especially if execution can take multiple hours:

-- Step 1
SELECT FIELD_1, FIELD_2, FIELD_3 INTO #Step1 FROM
(
SELECT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE WHERE XXX    
UNION  
SELECT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE WHERE YYY
) d

-- Step 2:
SELECT DISTINCT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE    
EXCEPT    
SELECT FIELD_1, FIELD_2, FIELD_3  FROM #Step1

Please note that some DISTINCT clauses are removed


Update, version 3. Based on the last update of OP:

Tab: it is the same table. The first query basically gives almost the whole table and second + third query are the subsets which I need to take away from the first query results

I believe the entire query can be rewritten to:

SELECT DISTINCT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE ext
WHERE NOT EXISTS (

SELECT * FROM (
    SELECT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE  
    WHERE  ( (XXX) OR (YYY))  -- original filter
)   list 
WHERE
    list.FIELD_1 = ext.FIELD_1
AND list.FIELD_2 = ext.FIELD_2
AND list.FIELD_3 = ext.FIELD_3
) 
Alexander Volok
  • 5,630
  • 3
  • 17
  • 33
  • 1
    You're missing a `FROM` in *Step 1* – S3S Feb 08 '19 at 14:51
  • In this case use a CTE as Step 1 – benjamin moskovits Feb 08 '19 at 14:55
  • @benjaminmoskovits, CTE will not isolate that part into own execution. Therefore in the end entire SQL command will be compiled into one general query plan. The idea with temp table was to split the workload and make task easier to optimizer to produce efficient query plan. However, OP gave a crucial comment that all operations involved on one single table, so I added update to answer – Alexander Volok Feb 08 '19 at 15:00
  • I agree, if it is the same table, just use WHERE NOT ( XXX ) AND NOT (YYY) – kpollock Feb 08 '19 at 15:02
  • @Alexander, I composed my answer after you wrote your initial post. After posting mine, I then saw you've updated yours to include the same concept. Timing. I promise I didn't copy you :-) – Richard II Feb 08 '19 at 15:08
  • 1
    the simplification is not valid. If a `FIELD_1, FIELD_2, FIELD_3` group has a mix of rows matching and not matching the `xxx/yyy` conditions this will return them. The query in the OP will exclude a group if a single row exists matching these conditions – Martin Smith Feb 08 '19 at 15:10
  • @RichardII, yeps it was obvious thing, when OP updated the question, but Martin made a right comment, anyway I delivered another version of the simplification – Alexander Volok Feb 08 '19 at 15:22
  • yes, I'm deleting my answer, precisely because of @MartinSmith's observation. – Richard II Feb 08 '19 at 15:25
0

I would use NOT EXISTS instead with CTE :

WITH CTE AS (
     <your union query>
)
SELECT mt.*
FROM MYTABLE mt
WHERE NOT EXISTS (SELECT 1 FROM CTE c WHERE c.FIELD_1 = mt.FIELD_1 AND . . . );   
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
0

First of i would not use field to with except, its better to use an Id becouse its indixable

The quarry below will have much faster performance.

This is how i would do it.

SELECT DISTINCT FIELD_1, FIELD_2, FIELD_3 FROM MYTABLE
where Id not in (
SELECT Id FROM MYTABLE WHERE XXX and YYY
)
Alen.Toma
  • 4,684
  • 2
  • 14
  • 31
  • OP may not have rights needed to add a new column to the table. it's also not necessary. see the update on this answer: https://stackoverflow.com/a/54594763/1633949 for a simpler approach. – Richard II Feb 08 '19 at 15:11
  • I see i assumed that you hade a primary key in the table. it always good to have a primary key in a table. – Alen.Toma Feb 08 '19 at 15:15