1

I'm having trouble optimizing a query. Here are two example tables I am working with:

Table 1:
UID
A
B

Table 2:
UID   Parent
A       2
B       2
C       3
D       2
E       3
F       2

Here is what I am doing now:

Select Table1.UID
FROM Table1 R

INNER JOIN Table2 T ON
R.UID = T.UID

INNER JOIN Table2 E ON
T.PARENT = E.PARENT
AND E.UID NOT IN (SELECT UID FROM Table1)

I'm trying to avoid using the NOT IN clause because of obvious hindrances in performance for large numbers of records.

I know the typical ways to avoid NOT IN clauses like the LEFT JOIN where the other table is null, but can't seem to get what I want with all of the other Joins going on.

I will continue working and post if I find a solution.

EDIT: Here is what I am trying to end up with

After the first Inner Join I would have

A
B

AFter the second Inner join I would have:

A    D 
A    F
B    D
B    F

The second column above is just to represent that it is matching to the other UIDs with the same parent, but I still need the As and Bs as the UID.

EDIT: RDBMS is SQL server 2005, 2008r2, 2012

Table1 is declared in the query with no index

DECLARE @Table1 TABLE ( [UNIQUE_ID] INT PRIMARY KEY )

Table2 has a clustered index on Unique ID

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
ReddShepherd
  • 467
  • 1
  • 11
  • 24
  • What you want to achieve? – Hamlet Hakobyan Jul 01 '13 at 21:04
  • 2
    NOT IN *can* be rewritten as a LEFT OUTER with NULL check. Whether or not it makes a difference depends upon how the Query Planner treats the two queries - so show a plan. (I almost always take the joins, out of preference for how it exposes the RA.) – user2246674 Jul 01 '13 at 21:05
  • 3
    Which RDBMS? `NOT IN` behaves quite well in most DBMS, with proper indexes. – ypercubeᵀᴹ Jul 01 '13 at 21:06
  • 1
    Down voted the question because I believe its founded on a misunderstanding -- as others comment, NOT IN is not necessarily a performance problem. NOT IN is amenable to rewrite to an anti-join, making it potentially more efficient than an outer join especially if the optimiser doesn't recognise that the outer join should be transformed to an anti-join. – David Aldridge Jul 01 '13 at 21:29
  • The NOT IN was causing my query to take over an hour for 2 million records. I changed it to a left outer join (granted it caused my query to be a little off) and it reduced it to 5 minutes. – ReddShepherd Jul 01 '13 at 21:52
  • `AND NOT EXISTS (SELECT UID FROM Table1 WHERE UID = E.UID)`, with an index on `UID` on `Table1`? – MatBailie Jul 01 '13 at 21:53
  • How sure are you that it was `NOT IN` to blame and not lack of indexes that could be used? – ypercubeᵀᴹ Jul 01 '13 at 22:02
  • 1
    If you want the query optimized (as your tag implies), include (at least) the DBMS you are using, the tables' definitions (CREATE statements), the indexes on them and the execution plan the query gives. – ypercubeᵀᴹ Jul 01 '13 at 22:09
  • NOT IN (SELECT X FROM Table) performs bad only when X is nullable. – Vlad G. Jul 01 '13 at 22:56
  • Are you trying to do a recursion through a temp table here? – Vlad G. Jul 01 '13 at 22:58
  • For all of the items in Table2 with the same Parent (but not included in Table1), I want a row with each of the UIDs from Table1 – ReddShepherd Jul 01 '13 at 23:04
  • I would try adding an index on `(parent, UID)`. Also (again): post the execution plans (before and after adding this index.) – ypercubeᵀᴹ Jul 02 '13 at 05:31
  • I added a nonclustered index on Table1 and the speed seems to be about where I want it. Does it make sense for that to allow the NOT IN to run efficiently? – ReddShepherd Jul 02 '13 at 13:48

2 Answers2

1

The general approach to this is to use a LEFT JOIN with a where clause that only selects the non-matching rows:

Select Table1.UID
FROM Table1 R    
JOIN Table2 T ON R.UID = T.UID
JOIN Table2 E ON T.PARENT = E.PARENT
LEFT JOIN Table3 E2 ON E.UID = R.UID
WHERE E2.UID IS NULL 
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This is close. I end up with an extra row for each UID from table1, but it is closer than my other implementations. I'll work with it a little and post the final query when I'm done. Thanks for the help. – ReddShepherd Jul 01 '13 at 21:48
  • Changing them to JOINS didn't change the result set – ReddShepherd Jul 01 '13 at 22:47
-1
SELECT Table2.*
FROM Table2
INNER JOIN (
  SELECT id FROM Table2
  EXCEPT
  SELECT id FROM Table1
) AS Filter ON (Table2.id = Filter.id)
Anon
  • 10,660
  • 1
  • 29
  • 31