-1

Can someone explain why this:

SELECT DISTINCT e.empid
FROM HR.Employees e
join (
    SELECT empid,orderdate 
    from sales.orders 
    WHERE orderdate = '20080212') as a
on e.empid != a.empid;

Doesn't provide the same result set as:

WITH A AS(SELECT distinct empid,orderdate FROM Sales.Orders WHERE orderdate = '20080212')
SELECT distinct e.empid,a.orderdate
FROM HR.Employees e
left JOIN A AS A
on e.empid = a.empid
WHERE a.orderdate is null;
APH
  • 4,109
  • 1
  • 25
  • 36
d4n743m4n
  • 13
  • 2
  • 1
    They just aren't even the same thing logically speaking. – Sean Lange Jun 29 '15 at 20:58
  • The 1st query to me says give me all empid's from HR.EMployees that aren't in the derived table. However it provides every distinct empid from HR.Employee table even the ones that exist in the derived table. – d4n743m4n Jun 29 '15 at 21:07
  • 2
    Well lets start with join verus left join, e.empid != a.empid versus e.empid = a.empid. – paparazzo Jun 29 '15 at 21:17
  • When you want to exclude rows that are IN or NOT IN a derived table you should use EXISTS and NOT EXISTS. And anytime you see an inequality in a join or where predicate you are breaking into nonSARGable territory which means table/index scans. Say goodbye to indexing. – Sean Lange Jun 29 '15 at 21:23
  • You're not looking at the entire statement.... The CTE produces a result set of empids and orderdates placed on 20080212. The outer query joins that set with empids that are in both tables AND rows that are NOT in the CTE. Then the WHERE filters out rows that do not have a NULL or unknown value in the orderdate column essentially producing all empids that did not have an order on 20080212...the idea is for the 1st query to produce the same empids. – d4n743m4n Jun 29 '15 at 21:26
  • @Sean. Not really worried about performance here but I know what you're saying. – d4n743m4n Jun 29 '15 at 21:30
  • You should always be concerned about performance. It is the second most important part of a query. It is second only to getting the accurate. – Sean Lange Jun 29 '15 at 21:40
  • The first query does a cartesian join then returns those rows where `e.empid != a.empid` and applies `distinct`. It's totally different from the second one. – Martin Smith Jun 29 '15 at 21:42

2 Answers2

4

I imagine the confusion must be because of the first query.

It does a cartesian join then returns those rows where e.empid != a.empid and then finally applies distinct. These are almost certainly not the semantics you want and they are very different from the second query.

To simplify things lets imagine you have a simple table

enter image description here

And the query

SELECT DISTINCT e.color
FROM   YourTable e
       JOIN YourTable AS a
         ON e.color != a.color; 

First (logical) step is the cartesian join

enter image description here

Then preserve all those rows where e.color != a.color (leaves six rows )

enter image description here

And finally take the DISTINCT of e.color

enter image description here

This is effectively what your first query is doing.

A small amount of thought shows that it will always end up returning all colours from the first table except if the second table is empty and the cross join returns zero rows or the second table just contains a single colour and that is also one of the colours in the first table.

Your second query performs an anti semi join and returns all rows from the left not matched on the right (assuming orderdate is not nullable), other ways of doing it might be using EXCEPT, NOT IN, OUTER APPLY or NOT EXISTS

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Excellent way of explaining it. – d4n743m4n Jun 29 '15 at 21:55
  • 1
    This was a question in my material studying for exam 70-461 for an MCSA. It was requesting doing it as a join, set operator, and table expression. Otherwise the easiest way is the set operator EXCEPT and I wouldn't have put myself through the punishment :). – d4n743m4n Jun 29 '15 at 22:05
-1

Think I figured it out. Basically, when saying

SELECT DISTINCT e.empid FROM HR.Employees e join ( SELECT empid,orderdate from sales.orders WHERE orderdate = '20080212') as a on e.empid != a.empid;

The reason they don't produce the same result is that there is a scenario when every empid in HR.Employees will not equal at least one the ones in derived table.

d4n743m4n
  • 13
  • 2