3

Why does SQL server behave this way. I am running this on SQL 2005.

The IN clause does not validate the column names in the sub query but validates it against the table name in the outer query. Here is an example of getting

Create table #table1(col1 int, col2 char(10), col3 char(15));

Create table #table2(col10 int, col11 char(10), col2 char(15));


insert into #table1(col1, col2, col3)
select 1, 'one', 'three'

insert into #table1(col1, col2, col3)
select 2, 'two', 'three'

insert into #table1(col1, col2, col3)
select 3, 'three', 'four'


insert into #table2(col10, col11, col2)
select 1, 'one', 'three'

insert into #table2(col10, col11, col2)
select 2, 'two', 'three'

insert into #table2(col10, col11, col2)
select 3, 'three', 'four'


select * from #table1
where col1 IN
(select col1 from #table2)

Where as if I just select the "select col1 from #table2" and run it spits an error

Msg 207, Level 16, State 1, Line 1
Invalid column name 'col1'.
Jeremy Holovacs
  • 22,480
  • 33
  • 117
  • 254
  • 2
    This allows you to filter data in subquery by outer query and return expressions using this data. It is too much to expect Sql Server to deem `col1` invalid but `col1*col10` valid. – Nikola Markovinović Sep 07 '12 at 13:37
  • Based on the definition of `#table2` there is no `col1` in that table - that would be why there is an error. – Mike Perrenoud Sep 07 '12 at 13:40
  • 1
    You could have just looked at the query plan to work out that it's not considering col1 to be in the second table - see [this screenshot](http://i49.tinypic.com/2rfvl37.png) - look at the Predicate section. – Bridge Sep 07 '12 at 13:43
  • 2
    Btw, good that you've provided sample data +1. Next time you may additionally want to post a sql-fiddle: http://sqlfiddle.com/#!3/bb89e/6/0 – Tim Schmelter Sep 07 '12 at 13:43
  • 2
    The First Rule of Programming: [`SELECT` isn't broken](http://www.codinghorror.com/blog/2008/03/the-first-rule-of-programming-its-always-your-fault.html) – Remus Rusanu Sep 07 '12 at 13:44
  • Create another test where the table data don't match. – JeffO Sep 07 '12 at 14:49

2 Answers2

13

Why? Because it's frequently useful to be able to reference columns from the outer query in subqueries. There's no setting you can use to turn off this behaviour, but if you get into the habit of using aliases, you should avoid most problems with it:

select * from #table1 t1
where t1.col1 IN
(select t2.col1 from #table2 t2)

Will produce an error.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • 1
    Beat me to it; was going to show the explicit implementation as you did as well. – Brad Christie Sep 07 '12 at 13:39
  • Damien, thanks for the explanation. Most of the times I would use aliases when working with multiple tables but this particular SQL behavior threw me off, I'll have to remember to use aliases at all times. – user1141441 Sep 12 '12 at 14:00
6

It's not the IN clauses that's the problem.

This:

SELECT * 
  FROM #table1
 WHERE col1 IN (SELECT col1 
                  FROM #table2)

...works, because the optimizer is assumes col1 to be from #table1. If you use table aliases so there's no ambiguity:

SELECT t1.* 
  FROM #table1 t1
 WHERE t1.col1 IN (SELECT t2.col1 
                     FROM #table2 t2)

...you'll get the Msg 207: Invalid column error.

This is the same principle as when working with DELETE and UPDATE statements, because the typical syntax doesn't allow you to alias the table being deleted or updated.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • true, but most of the times we often not use aliases when working with a single table in particular when using IN clause. When dealing with Migration scripts converting data from different sources, this SQL behavior would hurt when aliases are not used. But thank you for pointing it out, I would make it a point to use aliases. – user1141441 Sep 12 '12 at 14:04
  • @user1141441: Yes, situations like this are exactly why I am so adamant about table alias usage. – OMG Ponies Sep 13 '12 at 00:48