-4

I have the following select:

SELECT DISTINCT pl 
FROM [dbo].[VendorPriceList] h
WHERE PartNumber IN (SELECT DISTINCT PartNumber 
                     FROM [dbo].InvoiceData 
                     WHERE amount > 10 
                       AND invoiceDate > DATEADD(yyyy, -1, CURRENT_TIMESTAMP)

                     UNION

                     SELECT DISTINCT PartNumber 
                     FROM [dbo].VendorDeals)

The issue here is that the table [dbo].VendorDeals has NO column PartNumber, however no error is detected and the query works with the first part of the union.

Even more, IntelliSense also allows and recognize PartNumber. This fails only when inside a complex statement.

It is pretty obvious that if you qualify column names, the mistake will be evident.

fcm
  • 1,247
  • 15
  • 28
  • While asking a question, you need to provide a **minimal reproducible example**. Please refer to the following link: https://stackoverflow.com/help/minimal-reproducible-example Please provide the following: (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in #1 above. (4) Your SQL Server version (SELECT @@version;) – Yitzhak Khabinsky May 25 '21 at 19:06
  • No this isn't a bug, it's due to your lack of qualifying your columns. – Thom A May 25 '21 at 19:07
  • 2
    [No, this is not a bug in T-SQL](https://sqlinthewild.co.za/index.php/2019/04/23/no-this-is-not-a-bug-in-t-sql/) – Thom A May 25 '21 at 19:08
  • `DISTINCT` and `UNION` (as opposed to `UNION ALL`) are logically pointless in that subquery, any case `UNION` implies `DISTINCT` – Charlieface May 25 '21 at 19:57
  • @marc_s the missing just a typo from the editor. – fcm May 26 '21 at 02:24
  • @YitzhakKhabinsky The question is brief and very reproductible; no data is required nor the table definition, if you take the time to read the question. – fcm May 26 '21 at 02:27
  • It's a hyperlink, @fcm, you're meant to read the article linked of the same name, which tells you everything you need to know. – Thom A May 29 '21 at 08:18

1 Answers1

5

This isn't a bug in SQL Server/the T-SQL dialect parsing, no, this is working exactly as intended. The problem, or bug, is in your T-SQL; specifically because you haven't qualified your columns. As I don't have the definition of your table, I'm going to provide sample DDL first:

CREATE TABLE dbo.Table1 (MyColumn varchar(10), OtherColumn int);
CREATE TABLE dbo.Table2 (YourColumn varchar(10) OtherColumn int);

And then an example that is similar to your query:

SELECT MyColumn
FROM dbo.Table1
WHERE MyColumn IN (SELECT MyColumn FROM dbo.Table2);

This, firstly, will parse; it is a valid query. Secondly, provided that dbo.Table2 contains at least one row, then every row from table dbo.Table1 will be returned where MyColumn has a non-NULL value. Why? Well, let's qualify the column with table's name as SQL Server would parse them:

SELECT Table1.MyColumn
FROM dbo.Table1
WHERE Table1.MyColumn IN (SELECT Table1.MyColumn FROM dbo.Table2);

Notice that the column inside the IN is also referencing Table1, not Table2. By default if a column has it's alias omitted in a subquery it will be assumed to be referencing the table(s) defined in that subquery. If, however, none of the tables in the sub query have a column by that name, then it will be assumed to reference a table where that column does exist; in this case Table1.

Let's, instead, take a different example, using the other column in the tables:

SELECT OtherColumn
FROM dbo.Table1
WHERE OtherColumn IN (SELECT OtherColumn FROM dbo.Table2);

This would be parsed as the following:

SELECT Table1.OtherColumn
FROM dbo.Table1
WHERE Table1.OtherColumn IN (SELECT Table2.OtherColumn FROM dbo.Table2);

This is because OtherColumn exists in both tables. As, in the subquery, OtherColumn isn't qualified it is assumed the column wanted is the one in the table defined in the same scope, Table2.

So what is the solution? Alias and qualify your columns:

SELECT T1.MyColumn
FROM dbo.Table1 T1
WHERE T1.MyColumn IN (SELECT T2.MyColumn FROM dbo.Table2 T2);

This will, unsurprisingly, error as Table2 has no column MyColumn.

Personally, I suggest that unless you have only one table being referenced in a query, you alias and qualify all your columns. This not only ensures that the wrong column can't be referenced (such as in a subquery) but also means that other readers know exactly what columns are being referenced. It also stops failures in the future. I have honestly lost count how many times over years I have had a process fall over due to the "ambiguous column" error, due to a table's definition being changed and a query referencing the table wasn't properly qualified by the developer...

Thom A
  • 88,727
  • 11
  • 45
  • 75