SQL Query Duplicating records

Question

I've got two tables. Let's call them table_A and table_B.

Table_B contains the ForeignKey of table_A.

Table_A

ID         Name
1          A
2          B
3          C

Table_B



ID         table_a_fk
1          2 
2          3

Now I want to get all the names out of table_a IF table_b does not contain the ID of the record in table_a.

I've tried it with this query:

SELECT a.name
FROM table_a a, table_b b
WHERE a.id != b.table_a_fk

With this Query I'm getting the right result I just get this result like 5times and I don't know why.

Hope someone can explain me that.

Charles Bretana · Answer 1 · 2015-11-09T16:02:13.133

1

use distinct

SELECT distinct  a.name
FROM table_a a, table_b b
WHERE a.id != b.table_a_fk

or better is...

Select distinct name
from tableA a
Where not exists (Select * from tableB 
                  Where table_a_fk = a.id)

edited Nov 09 '15 at 16:02

answered Nov 06 '15 at 13:42

Charles Bretana

143,358
22
150
216

I'm a bit surprised by the suggestion of using `DISTINCT` to fix a query that is semantically wrong... Also, the second query is wrong too. – Lukas Eder Nov 08 '15 at 11:13
@Lukas, Although it might appear that it ought to be unique, there is nothing in the question to support the idea that the `name` field is unique within the dataset. Therefore, to avoid duplicate names in the output, you have to use distinct. You're correct in that the first query is ugly, but it's hard to argue that anything that works is *wrong*. It is in my answer because that's the form the op presented in his question, and so a solution that builds on what he {arguably] is already familiar with might be more likely to get the point across. – Charles Bretana Nov 09 '15 at 12:47
Let `table_a` contain `(1, A)`, and `table_b` be empty. Your first query will return nothing instead of `A`. It's wrong. Your second query is wrong for the data set suggested by the OP. – Lukas Eder Nov 09 '15 at 14:43
@Lukas, The op's question clearly states: "With this Query I'm getting the right result I just get this result like 5times and I don't know why". If this is true, (and I have no reason to doubt the statement) then adding distinct will eliminate the dupes. And you have no way of knowing what data is in the OPs database. – Charles Bretana Nov 09 '15 at 15:11
Charles :) Yes, your query 1 works for OP's data. Yet: check out the OP's words *"Now I want to get all the names out of table_a IF table_b does not contain the ID of the record in table_a."* The OP wants an anti-join. None of your suggestions implement an anti-join. Take a different data set and your query 1 won't work. In your query 2, replace `<>` by `=` and it would be correct – Lukas Eder Nov 09 '15 at 15:20
@Lukas, ON my second query, you are right, the `<>` in subquery should be an equal sign [`=`] thanks, I will edit my answer... double negative strikes again... – Charles Bretana Nov 09 '15 at 16:02

score 1 · Accepted Answer · edited May 23 '17 at 12:03

Your query creates a cartesian product between your two tables A and B. It is the cartesian product that generates those duplicate values. Instead, you want to use an anti-join, which is most commonly written in SQL using NOT EXISTS

SELECT a.name
FROM table_a a
WHERE NOT EXISTS (
  SELECT *
  FROM table_b b
  WHERE a.id = b.table_a_fk
)

Another way to express an anti-join with NOT IN (only if table_b.table_a_fk is NOT NULL):

SELECT a.name
FROM table_a a
WHERE a.id NOT IN (
  SELECT b.table_a_fk
  FROM table_b b
)

Another, less common way to express an anti-join:

SELECT a.name
FROM table_a a
LEFT OUTER JOIN table_b b ON a.id = b.table_a_fk
WHERE b.id IS NULL

SQL Query Duplicating records

2 Answers2