4

I am getting different result set for these two queries and second result set seems to be correct. What is the difference in these queries.

What type of inner join query second is?

1)

FROM TABLE1 t1
 INNER JOIN TABLE2 t2 ON t1.Id = t2.Id
WHERE
 t1.StatusId = 12

2)

FROM TABLE1 t1
 INNER JOIN TABLE2 t2 ON t1.Id = t2.Id
 AND t1.StatusId = 12
Christian Specht
  • 35,843
  • 15
  • 128
  • 182
Dave
  • 197
  • 2
  • 3
  • 7
  • 1
    plsql is for oracle. I think you meant tsql. – Joel Coehoorn Oct 13 '10 at 19:24
  • 1
    Everyone wants to see your results. Can you append them to your question? – egrunin Oct 13 '10 at 19:43
  • 1
    Is this a simplification of the query you are actually running? Sometimes it isn't what you think is causing the issue that is causing the issue. Please post the real query if you have simplified. Those two queries should be equivalent. – HLGEM Oct 13 '10 at 20:56
  • @HLGEM I think the code did make an appearance if you look at the edit history. @dave - Did you spot the issue with it after posting it? – Martin Smith Oct 13 '10 at 21:12

6 Answers6

11

It really makes no difference when you do this in the INNER JOIN.

However, when you use LEFT or RIGHT JOIN, it does make a difference whether you put the additional filter into the JOIN or into the WHERE clause.

When you put the filter into the WHERE clause, SQL Server does the join first, and then completely filters out the rows where the filter does not fit.
--> this will reduce the number of rows which are returned

When you put the filter into the JOIN, SQL Server does the filtering during the join, but only on the table where you put the filter.
You still get all the rows from the other tables, but only those have the data from the filtered table where the filter fits.
--> this will not reduce the number of rows, but the columns with data from the filteres table will be empty in more rows

It's difficult to explain...to make it more clear, here's an example:

Take the sample data from RedFilter's answer:

CREATE TABLE [dbo].[t1](
    [ID] [int] NULL,
    [StatusID] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[t2](
    [ID] [int] NULL
) ON [PRIMARY]
INSERT INTO t1 (ID, StatusID) VALUES (1, 10)
INSERT INTO t1 (ID, StatusID) VALUES (2, 11)
INSERT INTO t1 (ID, StatusID) VALUES (3, 12)
INSERT INTO t1 (ID, StatusID) VALUES (4, 12)
INSERT INTO t2 (ID) VALUES (1)
INSERT INTO t2 (ID) VALUES (3)
INSERT INTO t2 (ID) VALUES (5)

...and run the following queries on it:

/* this returns four rows, but only two will have data 
from the second table in the second column */
SELECT t1.ID, t2.ID
FROM t1 
LEFT JOIN t2 ON t1.Id = t2.Id 

/* this returns only one row: the one where t2.ID = 1 */
SELECT t1.ID, t2.ID
FROM t1 
LEFT JOIN t2 ON t1.Id = t2.Id 
WHERE t2.ID = 1 

/* this returns four rows as in the first query, but only one 
row will have data in the second column: the one where t2.ID = 1 */
SELECT t1.ID, t2.ID
FROM t1 
LEFT JOIN t2 ON t1.Id = t2.Id 
AND t2.ID = 1 

Note the different results as indicated in the comments.

Community
  • 1
  • 1
Christian Specht
  • 35,843
  • 15
  • 128
  • 182
7

These queries should be returning identical results. Please post the complete queries.

Below is a sample demonstrating the same output from the two variations:

Sample:

CREATE TABLE [dbo].[t1](
    [ID] [int] NULL,
    [StatusID] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[t2](
    [ID] [int] NULL
) ON [PRIMARY]
INSERT INTO t1 (ID, StatusID) VALUES (1, 10)
INSERT INTO t1 (ID, StatusID) VALUES (2, 11)
INSERT INTO t1 (ID, StatusID) VALUES (3, 12)
INSERT INTO t1 (ID, StatusID) VALUES (4, 12)
INSERT INTO t2 (ID) VALUES (1)
INSERT INTO t2 (ID) VALUES (3)
INSERT INTO t2 (ID) VALUES (5)

SELECT t1.ID, t1.StatusID
FROM t1 
INNER JOIN t2 ON t1.Id = t2.Id 
WHERE t1.StatusId = 12 

SELECT t1.ID, t1.StatusID
FROM t1 
INNER JOIN t2 ON t1.Id = t2.Id 
    AND t1.StatusId = 12 

Output:

ID          StatusID
----------- -----------
3           12

(1 row(s) affected)

ID          StatusID
----------- -----------
3           12

(1 row(s) affected)
D'Arcy Rittich
  • 167,292
  • 40
  • 290
  • 283
3

Looking at the edit history of the question it appears that your queries are along the following lines.

Query One

  INNER JOIN dbo.T2
        ON ...
    LEFT OUTER  JOIN dbo.T3
        ON  ...
    WHERE 
     T3.col = somevalue AND ...

Query Two

  INNER JOIN dbo.T2
        ON ...
    LEFT OUTER  JOIN dbo.T3
        ON  ... AND T3.col = somevalue
   WHERE 
        ... 

The difference between them is that Query One effectively converts the LEFT Join to an INNER Join.

For a left outer join conceptually the following happens.

  1. T2 is joined onto T3 and the predicate in the join clause is evaluated.
  2. Any non matching rows from T2 are added back in with NULL values for the T3 column values.
  3. The WHERE clause is applied.

None of these rows added back in in step 2 will meet the T3.col = somevalue predicate in step 3 as we know that the value of this column for all these rows is NULL.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845
1

SQL is a declarative language, and it's up to the DBMS to implement your query. The SQL Server query optimiser does not care whether you inner-join two tables using a JOIN expression, a WHERE expression, an IN expression or an EXISTS expression, internally they are all treated the same. SQL Server has 3 different internal join algorithms, which are selected only on the basis of row count and data distribution, not the exact syntax used to declare them. If you compare the execution plans of the two queries in your example you should find that they are identical, and will therefore produce the same resultset.

Christian Hayter
  • 30,581
  • 6
  • 72
  • 99
0

The difference under the hood is that the WHERE clause is accessed after all the joins.

This means that maybe in this case there is no difference but that would not always be true when there are multiple joins. They could be factored out of the virtual resultsets (depending on the circumstances) before the WHERE is processed.

Kindly see this link: http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/

Mike Cheel
  • 12,626
  • 10
  • 72
  • 101
-1

I believe that it is like this:

The first query gets all the results from the joined tables and then filters on the WHERE clause.

The second query gets the results from table 1 where the StatusID = 12 and then performs the join on the result set.

It's a subtle difference.

Edit: Here's a link to an article verifying the difference: http://www.sqlteam.com/article/additional-criteria-in-the-join-clause

Leslie
  • 3,604
  • 7
  • 38
  • 53
  • how does this subtle difference cause a difference in output? – Jimmy Oct 13 '10 at 19:15
  • Yes, but the question is why don't they return the same results. They appear to the naked eye to request the same information and I would guess from looking at them that they would generate the same execution plan. – Larry Lustig Oct 13 '10 at 19:18
  • 2
    -1: Believing is not enough in this case - please provide a sample use case demonstrating the difference in output. – D'Arcy Rittich Oct 13 '10 at 19:36
  • -1 as well. I've tested multiple tables, and the execution plans are identical regardless of using `WHERE t1.StatusId = 12 ` or adding `AND t1.StatusId = 12` as part of the join. SQL should be smart enough to figure out the execution plans based on the query. – LittleBobbyTables - Au Revoir Oct 13 '10 at 19:48
  • 2
    From the article you linked to: "Note that there is no difference in the result sets if we use an inner join instead of an outer join, because the inner join removes the non-matching records regardless." – 9b5b Oct 15 '10 at 15:57