I have two tables which are defined as below:
CREATE TABLE `a` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`cid` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
`data` varchar(1024) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL DEFAULT '',
`update_time` timestamp(0) NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP(0),
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 150 CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Compact;
there are 10k rows in a and 200k rows in b, all the data were generated by a random function. Below is a sample:
151 8VE6BU06 8VE6BU06 2019-09-12 23:07:39
Here are three tests,
1. Case 1 costs 2.889s
SELECT cid FROM `a` WHERE a.cid not in (select b.cid from b);
execution plan
1 PRIMARY a ALL 10094 Using where
2 SUBQUERY b ALL 199826
2. case 2 costs 628.699s
delete from `a` WHERE a.cid not in (select b.cid from b);
execution plan
1 PRIMARY a ALL 10094 Using where
2 DEPENDENT SUBQUERY b ALL 199826 Using where
3. case 3 costs 0.036s
alter table b add index cid(cid);
delete from `a` WHERE a.cid not in (select b.cid from b);
execution plan
1 PRIMARY a ALL 10094 Using where
2 DEPENDENT SUBQUERY b index_subquery cid cid 302 func 1 Using index
Question
- Why test2 takes so long?
- Why test1 uses SUBQUERY while test2 uses DEPENDENT SUBQUERY? What's the reason test1 executes so fast comparing to test2?
- Why test3 is so fast comparing to test2?
MySQL Version: 5.6.20 innodb
Row data in table b is like:
151 8VE6BU06 8VE6BU06 2019-09-12 23:07:39