19

I'm confused by an SQL query, and honestly, its one of those things that I'm not even sure how to google for. Thus StackOverflow.

I have what I think is a simple query.

SELECT Id
FROM Customer
WHERE Id IN (SELECT Id from @CustomersWithCancelledOrders)

Here's where I find the weirdness. There is no column called Id in the @CustomersWithCancelledOrders table variable. But there isn't an error.

What this results in is the Ids for all Customers. Every single one. Which obviously defeats the point of doing a sub-query in the first place.

It's like its using the Id column from the outer table (Customers), but I don't understand why it would do that. Is there ever a reason you would want to do that? Am I missing something incredibly obvious?

SQLFiddle of the weirdness. It's not the best SQL Fiddle, as I couldn't find a way to return multiple result sets on that website, but it demonstrates how I ran across the issue.

I suppose what I'm looking for is a name for the "feature" above, some sort of information about why it does what it does and what the incorrect query actually means.


I've updated the above question to use a slightly better example. Its still contrived, but its closer to the script I wrote when I actually encountered the issue.


After doing some reading on correlated subqueries, it looks like my typo (using the wrong Id column in the subquery) changes the behaviour of the subquery.

Instead of evaluating the results of the subquery once and then treating those results as a set (which was what I intended) it evaluates the subquery for every row in the outer query.

This means that the subquery evaluates to a set of different results for every row, and that set of results is guaranteed to have the customer Id of that row in it. The subquery returns a set consisting of the Id of the row repeated X number of times, where X is the number of rows in the table variable that is being selected from.

...

Its really hard to write down a concise description of my understanding of the issue. Sorry. I think I'm good now though.

Todd Bowles
  • 1,554
  • 15
  • 24
  • Incidently, hard to tell if you are new to this or have simplified a valid case, but most of the time you should probably be joining tables rather than using a subquery. – JamesRyan Dec 10 '14 at 16:25
  • 3
    The technical term you are looking for is `correlated sub-query`. – Brandon Dec 10 '14 at 19:56
  • @JamesRyan I was writing a quick query yesterday and happened to accidentally write an UPDATE query similar to the above. The usage of the outer ID in the subquery was a typo, and I was very surprised when it updated all of the records in the table instead of the subset I intended. – Todd Bowles Dec 10 '14 at 22:38

3 Answers3

40

It's intended behaviour because in a sub query you can access the 'outer queries' column names. Meaning you can use Id from Table within the Subquery and the query therefore thinks you are using Id.

That's why you should qualify with aliases or fully qualified names when working with sub queries.

For example; check out
http://support.microsoft.com/kb/298674

Allan S. Hansen
  • 4,013
  • 23
  • 25
  • That clarifies the behaviour as intended, which I assumed. What does the query actually mean though? Find me all of the Id values inside Table where the Id value is in the set of values described by? – Todd Bowles Dec 10 '14 at 09:17
  • 8
    The query means basically this: `SELECT Id FROM Table WHERE Id IN (SELECT Table.Id FROM OtherTable)` Which s why it returns everything as long as there's any rows in OtherTable. It selects all Table.Id from the sub query if there are rows in the sub query. Try commenting out the insert into CustomerMap in your fiddle example. – Allan S. Hansen Dec 10 '14 at 09:22
11
SELECT ID
FROM [Table]
WHERE ID IN (SELECT OtherTable.ID FROM OtherTable)

This will generate an error. As Allan S. Hanses said, in the subquery you can use colums from the main query.

See this example

SELECT ID
FROM [Table]
WHERE ID IN (SELECT ID)
CiucaS
  • 2,010
  • 5
  • 36
  • 63
1

The query is a correlated sub-query and is most often used to limit the results of the outer query based on a column returned by the sub query; hence the 'correlated'. In this example the ID in the inner query is actually the ID from the table in the outer query. This makes the query valid but probably doesn't give you any useful results as it isn't actually correlating between the outer and inner queries.

Wayne
  • 11
  • 1