The subquery is more expensive in MySQL because MySQL hasn't optimized that type of query very well. MySQL doesn't notice that the subquery is invarant, in other words the subquery on T2 has the same result regardless of which row of T1 is being searched. So you and I can see clearly that MySQL should execute the subquery once and use its result to evaluate each row of T1.
But in fact, MySQL naively assumes that the subquery is a correlated subquery that may have a different result for each row in T1. So it is forced to run the subquery many times.
You clarified in a comment that your query is actually:
select a,b from t1
where (a,b) not in (select a,b from t2)
and a in (select a from t1);
You should also know that MySQL does not optimize tuple comparison at all. Even if it should use an index, it will do a table-scan. I think they're working on fixing that in MySQL 8.
Your second term is unnecessary, because the subquery is selecting from t1
, so obviously any value of a
in t1
exists in t1
. Did you mean to put t2
in the subquery? I'll assume that you just made an error typing.
Here's how I would write your query:
select a, b from t1
left outer join (select distinct a, b from t2) as nt2
on t1.a = nt2.a and t1.b = nt2.b
where nt2.a is null;
In these cases, MySQL treats the subquery differently because it appears in the FROM
clause. It runs each subquery once and stores the results in temp tables. Then it evaluates the rows of t1
against the data in the temp tables.
The use of distinct
is to make sure to do the semi-join properly; if there are multiple matching rows in t2
, we don't want multiple rows of output from the query.