3

Let's say I have two tables:

Table A
ProdID | PartNumber | Data...
1      | ABC-a      | "Data A"
2      | (null)     | "Data B"
3      | ABC-c      | "Data C"
...

and

Table B
ProdID | PartNumber | DataB
(null) | ABC-a      | "Data D"
2      | (null)     | "Data E"
3      | (null)     | "Data F"
(null) | ABC-z      | "Data G"
...

Not ideal, but anyway. I want

ProdID | PartNumber | Data     | DataB...
1      | ABC-a      | "Data A" |  "Data D"
2      | (null)     | "Data B" |  "Data E"
3      | ABC-c      | "Data C" |  "Data F"
(null) | ABC-z      | (null)   |  "Data G"

So I use

SELECT * 
FROM Table1 T1
     RIGHT JOIN Table2 T2 ON
          T1.ProdID = T2.ProdID OR T1.PartNumber = T2.PartNumber

Which does exactly what I want, but is seems to take about 100 times as long as either side of the or individually. As part of a more complex query it takes 2 minutes for the OR compared to <1 second for just the int and 1 second for just the nvarchar(50). Table "A" has ~13k rows, table "b" has ~35k and the whole query returns ~40k.

Query Plans OR query int nvarchar

I think this "Table Spool" may be the problem. enter image description here

SQL Server 2008 R2 Express. Thoughts?

Fowl
  • 4,940
  • 2
  • 26
  • 43
  • your result doesn't match your source data, as TableA and TableB both have a ProdID 4, but the results for that record show a null ProdID. Should one of the source tables have a null, or should the result have a prodID 4? – Joel Coehoorn Nov 29 '11 at 04:10
  • I've updated the example to be hopefully more clear. – Fowl Nov 29 '11 at 04:18
  • Gee making good examples is hard! – Fowl Nov 29 '11 at 04:26
  • See also [Is having an 'OR' in an INNER JOIN condition a bad idea?](http://stackoverflow.com/questions/5901791/is-having-an-or-in-an-inner-join-condition-a-bad-idea) – Martin Smith Nov 29 '11 at 09:13

5 Answers5

4

Join each way separately, then combine the results:

SELECT T1.ProdID, T1.PartNumber, T1.Data, ISNULL(tprodid.DataB, tpartno.DataB) as DataB
FROM Table1 T1
LEFT JOIN Table2 tprodid ON T1.ProdID = tprodid.ProdID
LEFT JOIN Table2 tpartno ON T1.PartNumber = tpartno.PartNumber;

This will use both indexes and will perform well. You may want to tweak the ISNULL logic to your liking.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • Instant, thank you. I really dislike have to "do it's job for it" by using COALESCE though. – Fowl Nov 29 '11 at 04:44
  • In case anyone is interested, here's the final query. SELECT COALESCE (a.ProdID, b1.ProdID, b2.ProdID) AS ProdID, COALESCE (a.PartNum, b1.PartNum, b2.PartNum) AS PartNum, a.Data COALESCE (b1.DataB, b2.DataB) AS DataB FROM Table1 AS a RIGHT OUTER JOIN Table2 AS b2 ON a.ProdID = b2.ProdID RIGHT OUTER JOIN Table2 AS b1 ON a.PartNum= b1.PartNum – Fowl Nov 29 '11 at 04:52
  • 1
    Your code does not run (hint: `T2` is not a valid correlation name), the resultset is not the same as the OP's (yours returns three rows, the OP's has four, plus you have missing columns) and you've changed `RIGHT` to `LEFT` for no apparent reason. Still you get three upvotes and the "correct answer" award! What's your secret? ;) – onedaywhen Nov 29 '11 at 08:50
  • oh wow, it's just enough for my (the OP's) brain to fill in the gaps. The right/left thing was my fault in the original though. – Fowl Nov 29 '11 at 09:41
  • @onedaywhen sorry - I don't have sqlserver installed so I couldn't test it. I corrected the alias issue. BTW, the `LEFT` joins are the right way to go IMHO. The "way I do it" is providing the *sufficiently* correct answer - enough to show the correct approach even if there might be minor syntactic issues * – Bohemian Nov 29 '11 at 11:11
0

Change the query to a union and you should get much better performance:

Select * from Table1 Left Join Table2 On Table1.ProdID = Table2.ProdID 
where Table1.PartNumber is null

union

Select * from Table1 Left Join Table2 On Table1.PartNumber =  Table2.PartNumber
where Table1.ProdId is null

The union operator will eliminate duplicate rows. That is, rows returned by both queries will only be returned once. So this should return the same data as your main query.

Jeff Siver
  • 7,434
  • 30
  • 32
0

You still need the OR, but you might do a little better with a FULL JOIN:

SELECT COALESCE(t1.ProdID,t2.ProdID) ProdID, 
    COALESCE(t1.PartNumber,t2.PartNumber) PartNumber, 
    t1.Data, t2.DataB
FROM TableA t1
FULL JOIN TableB t2 ON t1.ProdID = t2.ProdID OR t1.PartNumber = t2.PartNumber

The reason you have slow performance is because the OR forces it to not match up well with index, forcing a manual compare of one entire table with the other entire table. If you still have performance issues with the FULL JOIN, you can fix it either by adding an index for part number or by using an index hint to tell the optimizer your ProdID index will still be helpful.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
0

While not knowing MSSQL very well I'll attempt to at least provide a solution to your problem!

You should be getting better results with using a LEFT JOIN for each column you potentially want to join on and then merge the results something like the following:

SELECT
COALESCE(TA.ProdID, TB2.ProdID) AS ProdID,
COALESCE(TA.PartNumber, TB.PartNumber) AS PartNumber,
TA.Data,
COALESCE(TB.Data2, TB2.Data2) AS Data2
FROM TableA TA
LEFT JOIN TableB TB On TA.ProdID = TB.ProdID
LEFT JOIN TableB TB2 On TA.PartNumber = TB2.PartNumber
GROUP BY ProdId

While completely guessing, I'd say that it may be limited to using only one index per join though, forcing it to do one of the columns with a full table scan instead. You could try putting both columns into one index and using that index as an index hint on the join and see how it performs.

mikn
  • 484
  • 3
  • 7
0

I like Jeff Siver's suggestion of using UNION, though his suggested query is wrong. Here's a possible fix:

SELECT *
  FROM Table1 T1
       JOIN Table2 T2 
          ON T1.ProdID = T2.ProdID
UNION
SELECT *
  FROM Table1 T1
       JOIN Table2 T2 
          ON T1.PartNumber = T2.PartNumber
UNION
SELECT NULL, NULL, NULL, *
  FROM Table2 T2
 WHERE NOT EXISTS (
                   SELECT * 
                     FROM Table1 T1
                    WHERE T1.ProdID = T2.ProdID
                  )
       AND NOT EXISTS (
                       SELECT * 
                         FROM Table1 T1
                        WHERE T1.PartNumber = T2.PartNumber
                      );
onedaywhen
  • 55,269
  • 12
  • 100
  • 138