1

I have two tables, T1 and T2, which have some records in common. I am trying to find out which records are shared between the two. Just to clarify, the 'msg' is unique to each record.

T1 has ~1.4 million records. When I run

select count(*) from T1 where msg in (select msg from T2)

I get ~950,000 records. Ok, 950k out of 1.4 million. There should be around 450k records that show up when I run the inverse right? Query below:

select count(*) from T1 where msg not in (select msg from T2)

This query actually returns 0. How is that possible? Furthermore when I run

select count(*) from T1 where id not in (select id from T2)

I get about 1100 records, so there are obviously some records in T1 which aren't in T2.

What am I missing? Any help is greatly appreciated.

Nguyễn Văn Phong
  • 13,506
  • 17
  • 39
  • 56
morgan
  • 73
  • 1
  • 7
  • Firstly, fully qualify all your column references with the table name since both tables share a column name - you want to be 100% sure that SQL Server is using the correct column. Secondly is `msg` ever null? – Dale K Mar 26 '20 at 03:49
  • What is the type of `msg` column in both tables? Is it nullable? – Vladimir Baranov Mar 26 '20 at 03:49

2 Answers2

2

You're facing NULL values inside NOT IN clause problem.

You should do something like this

SET ANSI_NULLS ON;

SET ANSI_NULLS

Specifies ISO compliant behavior of the Equals (=) and Not Equal To (<>) comparison operators when they are used with null values in SQL Server 2019 (15.x).

Read the following thread to have a better understanding.

NULL values inside NOT IN clause

Nguyễn Văn Phong
  • 13,506
  • 17
  • 39
  • 56
2

Always go for NOT EXISTS clause, instead of NOT IN.

The reason is, even if one NULL value is present in the subquery, SQL Server returns 0 rows, because NOT IN will not match any rows.

select count(*) from T1 where msg not in (select msg from T2)

is equivalent to

select count(*) from T1 
WHERE (
msg != (SELECT msg FROM t2 WHERE ID=1)
AND
msg!= (SELECT msg FROM t2 WHERE ID=2)
.
.
.
msg!= (SELECT msg FROM t2 WHERE ID=SomeBigNumber)
)

Read more on NOT IN Vs NOT EXISTS

So, Please modify your query from

SELECT * FROM T1
WHERE msg NOT IN (select msg from T2);

to

SELECT * FROM T1
WHERE NOT EXISTS (select 1 from T2 WHERE T2.msg=T1.msg);

I have created a SQL Fiddle. Please refer the same.

Venkataraman R
  • 12,181
  • 2
  • 31
  • 58