It's a very simple query:
SELECT * FROM temp_company WHERE number NOT IN (SELECT number FROM company)
It was taking 15 minutes before but that was on a Mysql installation with too low buffer pool size and 15 minutes was OK because this is a monthly task. I upgraded to Mysql 5.7 (from something like 5.1 or 5.2) as the original install was 32bit and I couldn't up the innodb buffer pool size to the minimum required 10gb for this DB (I've set it to 16GB on a machine with 32GB RAM. I've now gone to run this query a month later and it was still running after 6 hours.
The EXPLAIN for the above is:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
1 | PRIMARY | temp_company | | ALL | | | | | 3226661 | 100.00 | Using where |
2 | DEPENDENT SUBQUERY | company | | index | number | number | 33 | | 3383517 | 100.00 | Using where |
The PRIMARY index on company and temp_company is id, but number is what they match on and that is a KEY in both but does the above suggest it's not using the index for the temp_company table?
The other logical query I thought to try was:
EXPLAIN SELECT tc.* FROM temp_company tc
LEFT JOIN company c on c.number = tc.number
WHERE c.number IS NULL
This is just as slow and the EXPLAIN is:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
1 | SIMPLE | tc | | ALL | | | | | 3226661 | 100.00 | |
2 | SIMPLE | c | | index | number | number | 33 | | 3383517 | 100.00 | Using where; Ising index; Using join buffer (block nested loop) |
Any help would be much appreciated. Perhaps Mysql changed the way it finds indexes?
**UPDATE 1-------
SHOW CREATE's: company
CREATE TABLE `company` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`count_telephone` mediumint(8) unsigned NOT NULL,
`count_fax` mediumint(8) unsigned NOT NULL,
`count_person` mediumint(8) unsigned NOT NULL,
`person_date` date DEFAULT NULL COMMENT 'Date the company_person relation was updated',
`count_email_address` mediumint(8) unsigned NOT NULL,
`name` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`url_date` date DEFAULT NULL,
`url_status` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT 'Failure count for crawling the URL',
`website_stamp_start` int(10) unsigned DEFAULT NULL,
`website_stamp` int(10) unsigned DEFAULT NULL,
`ch_url` varchar(255) DEFAULT NULL COMMENT 'Companies house URL',
`keywords_stamp_start` int(10) unsigned DEFAULT NULL,
`keywords_stamp` int(11) DEFAULT NULL,
`number` varchar(30) CHARACTER SET ascii COLLATE ascii_bin DEFAULT NULL,
`category` varchar(255) DEFAULT NULL,
`status` varchar(255) DEFAULT NULL,
`status_date` date DEFAULT NULL COMMENT 'Date the status field was updated',
`country_of_origin` varchar(80) DEFAULT NULL,
`dissolution_date` date DEFAULT NULL,
`incorporation_date` date DEFAULT NULL,
`account_ref_day` smallint(5) unsigned DEFAULT NULL,
`account_ref_month` smallint(5) unsigned DEFAULT NULL,
`account_next_due_date` date DEFAULT NULL,
`account_last_made_up_date` date DEFAULT NULL,
`account_category` varchar(255) DEFAULT NULL,
`returns_next_due_date` date DEFAULT NULL,
`returns_last_made_up_date` date DEFAULT NULL,
`mortgages_num_charges` smallint(5) unsigned DEFAULT NULL,
`mortgages_num_outstanding` smallint(5) unsigned DEFAULT NULL,
`mortgages_num_part_satisfied` smallint(5) unsigned DEFAULT NULL,
`mortgages_num_satisfied` smallint(5) unsigned DEFAULT NULL,
`partnerships_num_gen_partners` smallint(5) unsigned DEFAULT NULL,
`partnerships_num_lim_partners` smallint(5) unsigned DEFAULT NULL,
`ext_name` varchar(255) DEFAULT NULL,
`turnover` decimal(18,2) DEFAULT NULL,
`turnover_date` date DEFAULT NULL,
`trade_debtors` decimal(18,2) DEFAULT NULL,
`other_debtors` decimal(18,2) DEFAULT NULL,
`debtors_date` date DEFAULT NULL,
`real_turnover_band` int(11) DEFAULT NULL,
`est_turnover_band` int(11) DEFAULT NULL,
`ext_address_date` date DEFAULT NULL,
`care_of` varchar(255) DEFAULT NULL,
`po_box` varchar(60) DEFAULT NULL,
`line_1` varchar(255) DEFAULT NULL,
`line_2` varchar(255) DEFAULT NULL,
`town` varchar(60) DEFAULT NULL,
`county` varchar(60) DEFAULT NULL,
`country` varchar(60) DEFAULT NULL,
`post_code` varchar(20) DEFAULT NULL,
`DirScrapeID` int(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `homepage_keywords_stamp` (`keywords_stamp`),
KEY `number` (`number`),
KEY `url` (`url`),
KEY `town` (`town`),
KEY `county` (`county`),
KEY `post_code` (`post_code`),
KEY `name` (`name`),
KEY `website_stamp` (`website_stamp`),
KEY `website_stamp_start` (`website_stamp_start`),
KEY `keywords_stamp_start` (`keywords_stamp_start`),
KEY `turnover` (`turnover`),
KEY `status` (`status`),
KEY `category` (`category`),
KEY `incorporation_date` (`incorporation_date`),
KEY `real_turnover_band` (`real_turnover_band`),
KEY `est_turnover_band` (`est_turnover_band`)
) ENGINE=InnoDB AUTO_INCREMENT=3706459 DEFAULT CHARSET=utf8
temp_company:
CREATE TABLE `temp_company` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`ch_url` varchar(255) DEFAULT NULL,
`number` varchar(30) DEFAULT NULL,
`category` varchar(255) DEFAULT NULL,
`status` varchar(255) DEFAULT NULL,
`country_of_origin` varchar(80) DEFAULT NULL,
`dissolution_date` date DEFAULT NULL,
`incorporation_date` date DEFAULT NULL,
`account_ref_day` smallint(5) unsigned DEFAULT NULL,
`account_ref_month` smallint(5) unsigned DEFAULT NULL,
`account_next_due_date` date DEFAULT NULL,
`account_last_made_up_date` date DEFAULT NULL,
`account_category` varchar(255) DEFAULT NULL,
`returns_next_due_date` date DEFAULT NULL,
`returns_last_made_up_date` date DEFAULT NULL,
`mortgages_num_charges` smallint(5) unsigned DEFAULT NULL,
`mortgages_num_outstanding` smallint(5) unsigned DEFAULT NULL,
`mortgages_num_part_satisfied` smallint(5) unsigned DEFAULT NULL,
`mortgages_num_satisfied` smallint(5) unsigned DEFAULT NULL,
`partnerships_num_gen_partners` smallint(5) unsigned DEFAULT NULL,
`partnerships_num_lim_partners` smallint(5) unsigned DEFAULT NULL,
`ext_name` varchar(255) DEFAULT NULL,
`turnover` decimal(18,2) DEFAULT NULL,
`turnover_date` date DEFAULT NULL,
`trade_debtors` decimal(18,2) DEFAULT NULL,
`other_debtors` decimal(18,2) DEFAULT NULL,
`debtors_date` date DEFAULT NULL,
`real_turnover_band` int(11) DEFAULT NULL,
`est_turnover_band` int(11) DEFAULT NULL,
`ext_address_date` date DEFAULT NULL,
`care_of` varchar(255) DEFAULT NULL,
`po_box` varchar(60) DEFAULT NULL,
`line_1` varchar(255) DEFAULT NULL,
`line_2` varchar(255) DEFAULT NULL,
`town` varchar(60) DEFAULT NULL,
`county` varchar(60) DEFAULT NULL,
`country` varchar(60) DEFAULT NULL,
`post_code` varchar(20) DEFAULT NULL,
`sic_code` varchar(10) DEFAULT NULL,
`DirScrapeID` int(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `number` (`number`),
KEY `status` (`status`),
KEY `name` (`name`),
KEY `sic_code` (`sic_code`)
) ENGINE=InnoDB AUTO_INCREMENT=3297833 DEFAULT CHARSET=utf8
UPDATE 2: Profile of the query (with limit 5)
+-------------------------------+----------+
| Status | Duration |
+-------------------------------+----------+
| executing | 0.000001 |
| Sending data | 0.000112 |
| executing | 0.000001 |
| Sending data | 0.000111 |
| executing | 0.000001 |
| Sending data | 0.000110 |
| executing | 0.000001 |
| Sending data | 0.000110 |
| executing | 0.000001 |
| Sending data | 0.000110 |
| executing | 0.000001 |
| Sending data | 0.000111 |
| executing | 0.000001 |
| Sending data | 0.000111 |
| executing | 0.000001 |
| Sending data | 0.000112 |
| executing | 0.000001 |
| Sending data | 0.000112 |
| executing | 0.000001 |
| Sending data | 0.000112 |
| executing | 0.000001 |
| Sending data | 0.000112 |
| executing | 0.000001 |
| Sending data | 0.000112 |
| executing | 0.000001 |
| Sending data | 0.000112 |
| executing | 0.000001 |
| Sending data | 0.000113 |
| executing | 0.000001 |
| Sending data | 0.000114 |
| executing | 0.000001 |
| Sending data | 0.000114 |
| executing | 0.000001 |
| Sending data | 0.000114 |
| executing | 0.000001 |
| Sending data | 0.000115 |
| executing | 0.000001 |
| Sending data | 0.000116 |
| executing | 0.000001 |
| Sending data | 0.000115 |
| executing | 0.000001 |
| Sending data | 0.000115 |
| executing | 0.000001 |
| Sending data | 0.000116 |
| executing | 0.000001 |
| Sending data | 0.000116 |
| executing | 0.000001 |
| Sending data | 0.000115 |
| executing | 0.000001 |
| Sending data | 0.000115 |
| executing | 0.000001 |
| Sending data | 0.000116 |
| executing | 0.000001 |
| Sending data | 0.000116 |
| executing | 0.000001 |
| Sending data | 0.000117 |
| executing | 0.000001 |
| Sending data | 0.000117 |
| executing | 0.000001 |
| Sending data | 0.000117 |
| executing | 0.000001 |
| Sending data | 0.000118 |
| executing | 0.000001 |
| Sending data | 0.000118 |
| executing | 0.000001 |
| Sending data | 0.000118 |
| executing | 0.000001 |
| Sending data | 0.000118 |
| executing | 0.000001 |
| Sending data | 0.000118 |
| executing | 0.000001 |
| Sending data | 0.000118 |
| executing | 0.000001 |
| Sending data | 0.000120 |
| executing | 0.000001 |
| Sending data | 0.000120 |
| executing | 0.000001 |
| Sending data | 0.000121 |
| executing | 0.000001 |
| Sending data | 0.000123 |
| executing | 0.000001 |
| Sending data | 0.000121 |
| executing | 0.000001 |
| Sending data | 0.000120 |
| executing | 0.000001 |
| Sending data | 0.000121 |
| executing | 0.000001 |
| Sending data | 0.000121 |
| executing | 0.000001 |
| Sending data | 0.000121 |
| executing | 0.000001 |
| Sending data | 0.000122 |
| executing | 0.000001 |
| Sending data | 0.000123 |
| executing | 0.000001 |
| Sending data | 0.000124 |
| executing | 0.000001 |
| Sending data | 1.063880 |
| end | 0.000009 |
| query end | 0.000008 |
| closing tables | 0.000009 |
| freeing items | 0.000007 |
| Waiting for query cache lock | 0.000002 |
| freeing items | 0.000062 |
| Waiting for query cache lock | 0.000002 |
| freeing items | 0.000001 |
| storing result in query cache | 0.000002 |
| cleaning up | 0.000028 |
+-------------------------------+----------+