I did a MySQL performance optimization test, but the test results surprised me.
First of all, I prepared several tables for my test, which are "t_worker_attendance_300w(3 million data), t_worker_attendance_1000w(10 million data), t_worker_attendance_1y(100 million data), t_worker_attendance_4y(400 million data)".
Each table has the same field, the same index, they are copied, including 400 million data volume is also increased from 3 million data.
In my understanding, MySQL's performance is bound to be severely affected by the size of the data volume, but the results have puzzled me for a whole week. I've almost tested the scenarios I can think of, but their execution times are the same!
This is a new MySQL 5.6.16 server,I tested any scenario I could think of, including INNER JOIN....
A) SHOW CREATE TABLE t_worker_attendance_4y
CREATE TABLE `t_worker_attendance_4y` (
`id` bigint(20) NOT NULL ,
`attendance_id` char(32) NOT NULL,
`worker_id` char(32) NOT NULL,
`subcontractor_id` char(32) NOT NULL ,
`project_id` char(32) NOT NULL ,
`sign_date` date NOT NULL ,
`sign_type` char(2) NOT NULL ,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL ,
`sign_wages` decimal(16,2) DEFAULT NULL ,
`confirm_wages` decimal(16,2) DEFAULT NULL ,
`work_content` varchar(60) DEFAULT NULL ,
`team_leader_id` char(32) DEFAULT NULL,
`sign_state` char(2) NOT NULL ,
`confirm_date` date DEFAULT NULL ,
`sign_mode` char(2) DEFAULT NULL ,
`checkin_time` datetime DEFAULT NULL ,
`checkout_time` datetime DEFAULT NULL ,
`sign_hours` decimal(6,1) DEFAULT NULL ,
`overtime` decimal(6,1) DEFAULT NULL ,
`confirm_hours` decimal(6,1) DEFAULT NULL ,
`signimg` varchar(200) DEFAULT NULL ,
`signoutimg` varchar(200) DEFAULT NULL ,
`photocheck` char(2) DEFAULT NULL ,
`machine_type` varchar(2) DEFAULT '1' ,
`project_coordinate` text ,
`floor_num` varchar(200) DEFAULT NULL ,
`device_serial_no` varchar(32) DEFAULT NULL ,
KEY `checkin_time` (`checkin_time`),
KEY `worker_id` (`worker_id`),
KEY `project_id` (`project_id`),
KEY `subcontractor_id` (`subcontractor_id`),
KEY `sign_date` (`sign_date`),
KEY `project_id_2` (`project_id`,`sign_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
B) SHOW INDEX FROM t_worker_attendance_4y
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_worker_attendance_4y | 1 | checkin_time | 1 | checkin_time | A | 5017494 | NULL | NULL | YES | BTREE | | |
| t_worker_attendance_4y | 1 | worker_id | 1 | worker_id | A | 1686552 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id | 1 | project_id | A | 102450 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | subcontractor_id | 1 | subcontractor_id | A | 380473 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | sign_date | 1 | sign_date | A | 512643 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 1 | project_id | A | 102059 | NULL | NULL | | BTREE | | |
| t_worker_attendance_4y | 1 | project_id_2 | 2 | sign_date | A | 1776104 | NULL | NULL | | BTREE | | |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sign_date >= '07/01/2018' AND sign_date < '08/01/2018' ;
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| 1 | SIMPLE | tw | ref | project_id,sign_date,project_id_2 | project_id_2 | 96 | const | 54134596 | Using where; Using index |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
They all went through the same joint index.
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_300w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1000w tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.01 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'
AND sgin_date >= '07/01/2018'
AND sgin_date < '08/01/2018' LIMIT 0,10000;
Execution time: 0.02 sec
......
My guess is that MySQL's query performance will decline dramatically with the increase of data volume, but they are not much different. So I have no way to optimize my query. I don't know when to implement table partition plan or sub-database sub-table plan.
What I want to know is why the execution speed of index with small data volume is the same as that of index with large data volume. If you can help me, I would like to thank you very much.