Efficient SQL for selecting for values and NOT values in a join table

Question

I have a one to many relationship between tables A (columns ID and DESC) and B (columns FK_ID, CODE). ID-> FK_ID.

I'm curious to find the most efficient SQL to select all rows in A where there is a code in B named 'C1' but there is NOT a code in B named 'N1'.

Any help appreciated.

What RDBMS? SQL Server? MySQL? Oracle? – JNK Jan 16 '12 at 18:22 — JNK, Jan 16 '12 at 18:22

score 3 · Accepted Answer · answered Jan 16 '12 at 18:29

3

Using EXISTS and NOT EXISTS would give the best performance

SELECT tableA.*
FROM tableA
WHERE
  EXISTS(
    SELECT NULL
    FROM TableB 
    WHERE tableA.ID = TableB.FK_ID AND TableB.Code = 'C1')
  AND NOT EXISTS(
    SELECT NULL
    FROM TableB 
    WHERE tableA.ID = TableB.FK_ID AND TableB.Code = 'N1')

answered Jan 16 '12 at 18:29

Magnus

45,362
8
80
118

+1 - I think this is the fastest method (assuming SQL Server) – JNK Jan 16 '12 at 18:31
+1, although it might not give the best performance - in some databases (including MySQL, I think) a left outer join with where NULL on the outer joined table can give better performance. – Jan 16 '12 at 18:33
@MarkBannister for MySQL Left join with null check would be faster if the columns on both sides are not nullable, else `exists` is better. For SQL server `exists` is always better (if I remember correctly). – Magnus Jan 16 '12 at 18:44

score 1 · Answer 2 · edited May 23 '17 at 12:03

select all rows in A where there is a code in B named 'C1' but there is NOT a code in B named 'N1'.

I suspect your spec doesn't accurately convey what you actually require!

A literal interpretation would be

if there exists a row in B where CODE = 'C1'
   and there does not exist a row in B where CODE = 'B1'
then return all rows from A

(that is, there is no correlation between the values in A and B respectively) e.g.

SELECT * 
  FROM A
 WHERE EXISTS (
               SELECT * 
                 FROM B
                WHERE CODE = 'B1'
              )
       AND NOT EXISTS (
                       SELECT * 
                         FROM B
                        WHERE CODE = 'C1'
                      );

However, experience (alone) tells us that you probably want to correlate the tables using A.ID = B.FK_ID. See Magnus's answer for one approach.

Note the existential and non-existential operators alluded to here are known as semi join and semi difference respectively and there are different ways they can be written in SQL. Which will be most efficient from a performance point of view will depend on many variables, including SQL engine, data, indexes, statistics, etc. You will need to test using typical data. Also consider that readability and ease of maintenance are important factors too.

This candidate query (or something similar) is could be worth considering if your SQL product of choice supports the minus operator EXCEPT (actually called MINUS in Oracle):

SELECT * 
  FROM A
 WHERE ID IN (
              SELECT FK_ID
                FROM B
               WHERE CODE = 'B1'
              EXCEPT
              SELECT FK_ID 
                FROM B
               WHERE CODE = 'C1'
             );

score 0 · Answer 3 · answered Jan 16 '12 at 18:29

My understanding (please correct me if I'm wrong) is you have two tables like so:

create table A(ID int not null primary key, "DESC" varchar(30));
create table B(FK_ID int not null references A (ID), CODE varchar(30));

then something like this would work:

select
    ta.* 
from
    A as ta
inner join
    B as tb
    on tb.FK_ID = ta.ID
where 
    tb.CODE = 'C1'
    and not exists (select null from B as tb2 where tb2.FK_ID = ta.ID and tb2.CODE = 'N1')

Efficient SQL for selecting for values and NOT values in a join table

3 Answers3