0

I have a table which includes 30 records, and a smaller table has 10 records, both tables have the same schema. All I want to do is to return a table, whose records are in the big table, but not in the small table. The solution I found is to use Except operator. However, when I run the query, it took me about 30 mins. so I am just wondering that if Except is computational expensive and it took a lot of resources?

Is there any functions can replace Except? Thanks for any help !

abatishchev
  • 98,240
  • 88
  • 296
  • 433
Akira
  • 273
  • 5
  • 15
  • 30 mins on a 30 record table? Something is wildly wrong or the table is locked. – dfundako May 09 '18 at 20:04
  • At a very minimum you should include the query **and** [include the actual Execution Plan](https://stackoverflow.com/a/7359705/1260204), you could use [Paste the Plan](https://www.brentozar.com/pastetheplan/) and share the link in your question. Also [try to read it yourself](https://stackoverflow.com/a/759097/1260204), maybe you can figure out the performance issue(s) with your query. Finally include the [schema DDL](https://en.wikipedia.org/wiki/Data_definition_language) along with the query you are executing. – Igor May 09 '18 at 20:06
  • Relevant: https://stackoverflow.com/questions/1662902/when-to-use-except-as-opposed-to-not-exists-in-transact-sql#1664153 – dfundako May 09 '18 at 20:10

3 Answers3

3

EXCEPT is a set operator and it should be reasonably optimized. It does remove duplicate values, so there is a bit more overhead than one might expect.

It is not so unoptimized that it would take 30 seconds on such small tables, unless you have columns whose size measures in many megabytes. Something else might be going on -- such as network or server contention.

EXCEPT is a very reasonable approach. NOT IN has a problem with NULL values and only works with one column. NOT EXISTS is going to work best when you have an appropriate index. Under some circumstances, EXCEPT is faster than NOT EXISTS.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

In this case, you should be using EXISTS. It is one of the most performant operations in SQL Server

SELECT * 
FROM big_table b
WHERE NOT EXISTS (
SELECT 1 
FROM small_table s
WHERE s.id = b.id)
dfundako
  • 8,022
  • 3
  • 18
  • 34
  • I have tried that but it is not compatible with Miscrosoft SQL, right? – Akira May 09 '18 at 20:07
  • @AkiraKaneshiro It is absolutely available in SQL Server. https://learn.microsoft.com/en-us/sql/t-sql/language-elements/exists-transact-sql?view=sql-server-2017 – dfundako May 09 '18 at 20:07
-1

There is no need to make things complicated for something so simple.

SELECT * FROM Table1 WHERE ID NOT IN (SELECT ID FROM Table2)
SunKnight0
  • 3,331
  • 1
  • 10
  • 8
  • 1. How do you know it is simple? 2. `NOT IN`/`IN` can be detrimental to performance and based on the question performance is already an issue. A better solution would be to use `NOT EXISTS` (or create an outer join and filter on that in the where clause). – Igor May 09 '18 at 20:19
  • `NOT IN` is easily the best performer if there is a proper Primary Key field. `NOT EXISTS` tests the entire record while `NOT IN` a single field. – SunKnight0 May 09 '18 at 20:21
  • That post talks about the possibility of NULLs, which is a non issue with a Primary key field. You do not have a sub-table situation like discussed there, where the comparison is Primary Key to Foreign Key. The comparison is Primary Key to Primary Key. – SunKnight0 May 09 '18 at 20:23
  • 1
    @SunKnight0 "NOT EXISTS tests the entire record" I don't think that is true, but I could be wrong. – dfundako May 09 '18 at 20:27
  • Which brings me back to point 1, you (we actually) do not know if there are even primary keys specified in the schema or what the relation is between the two tables. The question is very poorly written and this answer of using `not in` makes a lot of assumptions. – Igor May 09 '18 at 20:28
  • @dfundako - it does not, it tests on the condition(s) specified in the `where` clause. – Igor May 09 '18 at 20:29
  • Yeah, I was thinking of a different example. If the `WHERE` on `NOT EXISTS` only links by `Primary Key` the two are similar in execution, but the `NOT IN` much simpler to write. – SunKnight0 May 09 '18 at 20:31
  • Here is another relevant question with answer: [SQL Server IN vs. EXISTS Performance](https://stackoverflow.com/a/2065403/1260204) – Igor May 09 '18 at 20:32
  • @Igor: I agree that the existence of a PK is an assumption, but with lack of any other information I see no way to provide anything more useful. The `NOT EXISTS` solution makes the exact same assumption. – SunKnight0 May 09 '18 at 20:33
  • As for the performance based on the new link, I would think that the optimizer would perform the "SELECT ID FROM Table2" once for the entire query whereas it would have to perform `WHERE s.id = b.id` for every record. Am I missing something? – SunKnight0 May 09 '18 at 20:36