3

Please see the table structure below:

CREATE TABLE Person (id int not null, PID INT NOT NULL, Name VARCHAR(50))
CREATE TABLE [Order] (OID INT NOT NULL, PID INT NOT NULL)

INSERT INTO Person VALUES (1,1,'Ian')
INSERT INTO Person VALUES (2,2,'Maria')
INSERT INTO [Order] values (1,1)

Why does the following query return two results:

select * from Person WHERE id IN (SELECT ID FROM [Order])

ID does not exist in Order. Why does the query above produce results? I would expect it to error because I'd does not exist in order.

Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
w0051977
  • 15,099
  • 32
  • 152
  • 329

3 Answers3

11

This behavior, while unintuitive, is very well defined in Microsoft's Knowledge Base:

KB #298674 : PRB: Subquery Resolves Names of Column to Outer Tables

From that article:

To illustrate the behavior, use the following two table structures and query:

CREATE TABLE X1 (ColA INT, ColB INT)
CREATE TABLE X2 (ColC INT, ColD INT)
SELECT ColA FROM X1 WHERE ColA IN (Select ColB FROM X2)

The query returns a result where the column ColB is considered from table X1.

By qualifying the column name, the error message occurs as illustrated by the following query:

SELECT ColA FROM X1 WHERE ColA in (Select X2.ColB FROM X2)

Server: Msg 207, Level 16, State 3, Line 1
Invalid column name 'ColB'.

Folks have been complaining about this issue for years, but Microsoft isn't going to fix it. It is, after all, complying with the standard, which essentially states:

If you don't find column x in the current scope, traverse to the next outer scope, and so on, until you find a reference.

More information in the following Connect "bugs" along with multiple official confirmations that this behavior is by design and is not going to change (so you'll have to change yours - i.e. always use aliases):

Connect #338468 : CTE Column Name resolution in Sub Query is not validated
Connect #735178 : T-SQL subquery not working in some cases when IN operator used
Connect #302281 : Non-existent column causes subquery to be ignored
Connect #772612 : Alias error not being reported when within an IN operator
Connect #265772 : Bug using sub select

In your case, this "error" will probably be much less likely to occur if you use more meaningful names than ID, OID and PID. Does Order.PID point to Person.id or Person.PID? Design your tables so that people can figure out the relationships without having to ask you. A PersonID should always be a PersonID, no matter where in the schema it is; same with an OrderID. Saving a few characters of typing is not a good price to pay for a completely ambiguous schema.

You could write an EXISTS clause instead:

... FROM dbo.Person AS p WHERE EXISTS 
(
  SELECT 1 FROM dbo.[Order] AS o
  WHERE o.PID = p.id -- or is it PID? See why it pays to be explicit?
);
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
9

The problem here is that you're not using Table.Column notation in your subquery, table Order doesn't have column ID and ID in subquery really means Person.ID, not [Order].ID. That's why I always insist on using aliases for tables in production code. Compare these two queries:

select * from Person WHERE id IN (SELECT ID FROM [Order]);

select * from Person as p WHERE p.id IN (SELECT o.ID FROM [Order] as o)

The first one will execute but will return incorrect results, and the second one will raise an error. It's because the outer query's columns may be referenced in a subquery, so in this case you can use Person columns inside the subquery. Perhaps you wanted to use the query like this:

select * from Person WHERE pid IN (SELECT PID FROM [Order])

But you never know when the schema of the [Order] table changes, and if somebody drops the column PID from [Order] then your query will return all rows from the table Person. Therefore, use aliases:

select * from Person as P WHERE P.pid IN (SELECT O.PID FROM [Order] as O)

Just quick note - this is not SQL Server specific behaviour, it's standard SQL:

Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • there's none, my answer is about it. May be I was not clear. My answer is about using aliases helps to avoid such a mistakes – Roman Pekar Sep 02 '13 at 18:48
  • Thanks+1. Is this a correlated sub query? – w0051977 Sep 02 '13 at 19:29
  • @w0051977 it's an interesting question. I think when it's used without alias, it's correlated, because it's using value from outer query, but when it's aliased, it's just stand-alone query, like cte. Here http://en.wikipedia.org/wiki/Correlated_subquery correlated subquery is defined like 'subquery that uses values from the outer query' – Roman Pekar Sep 02 '13 at 19:38
  • I can't see when a correlated su query would be useful. Could you provide an axample? – w0051977 Sep 02 '13 at 19:40
  • 1
    Actually I work with SQL Server, but never used term correlated subquery at work :) but, you can rewrite your example like `select * from Person as P WHERE exits (SELECT * FROM [Order] as O where O.PID = P.pid)` - now your subquery looks like correlated – Roman Pekar Sep 02 '13 at 19:47
  • 4
    Since you insist on the *"ALWAYS"* using aliases part, you should change the `select *` to `select p.*` – ypercubeᵀᴹ Sep 03 '13 at 05:19
  • @ypercube don't think so, I can select data by * without aliases from many different tables. Main goal of aliases is to make joins and subqueries work predictable. – Roman Pekar Sep 03 '13 at 08:46
  • I thought that the main goal of aliases was to improve readability. – swasheck Sep 03 '13 at 14:44
  • 1
    @swasheck improve readability and make queries predictable :) the rule is not to use `*` in production code also. When I specify columns explicitly, I use aliases before column names, `*` here is just to make example query shorter. – Roman Pekar Sep 03 '13 at 14:51
  • To be fair, you could also say `SELECT ...` to imply that `SELECT *` is bad. – Aaron Bertrand Sep 03 '13 at 15:16
  • @AaronBertrand close to truth :) Just wanted to point on specific problem here and show query that as close as possible to OP query. I'm sure there'll be a question about view with select * later :) – Roman Pekar Sep 03 '13 at 15:19
  • I could also rename OP tables to imply that using keywords as table name is bad – Roman Pekar Sep 03 '13 at 15:25
  • It's not about aliases, it's about specifying table name in front of column name: SELECT tablename.columnname FROM tablename – Andrew Sep 11 '13 at 06:24
  • @Andrew yes and no, it's possible to do with table names, but I think aliases is more general way to do this (there could be self-joins, for example) – Roman Pekar Sep 11 '13 at 06:26
  • The specific problem was not because aliases weren't used, but because table name was not specified. If you want you can use alias as well, but the cause for problem doesn't change – Andrew Sep 11 '13 at 06:39
  • @Andrew I've not said that the problem is because of not using aliases, I've said that I insist on using aliases to avoid that. The problem is clearly about not using table.column notation. May be I wasn't very clear about that in my answer – Roman Pekar Sep 11 '13 at 06:43
0

Order table doesnt have id column

Try these instead:

select * from Person WHERE id IN (SELECT OID FROM [Order])

OR

select * from Person WHERE pid IN (SELECT PID FROM [Order])
Praveen Prasannan
  • 7,093
  • 10
  • 50
  • 70