2

I did a MySQL performance optimization test, but the test results surprised me.

First of all, I prepared several tables for my test, which are "t_worker_attendance_300w(3 million data), t_worker_attendance_1000w(10 million data), t_worker_attendance_1y(100 million data), t_worker_attendance_4y(400 million data)".

Each table has the same field, the same index, they are copied, including 400 million data volume is also increased from 3 million data.

In my understanding, MySQL's performance is bound to be severely affected by the size of the data volume, but the results have puzzled me for a whole week. I've almost tested the scenarios I can think of, but their execution times are the same!

This is a new MySQL 5.6.16 server,I tested any scenario I could think of, including INNER JOIN....

A) SHOW CREATE TABLE t_worker_attendance_4y

CREATE TABLE `t_worker_attendance_4y` (
`id` bigint(20) NOT NULL ,
`attendance_id` char(32) NOT NULL,
`worker_id` char(32) NOT NULL,
`subcontractor_id` char(32) NOT NULL ,
`project_id` char(32) NOT NULL ,
`sign_date` date NOT NULL ,
`sign_type` char(2) NOT NULL ,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL ,
`sign_wages` decimal(16,2) DEFAULT NULL ,
`confirm_wages` decimal(16,2) DEFAULT NULL ,
`work_content` varchar(60) DEFAULT NULL ,
`team_leader_id` char(32) DEFAULT NULL,
`sign_state` char(2) NOT NULL ,
`confirm_date` date DEFAULT NULL ,
`sign_mode` char(2) DEFAULT NULL ,
`checkin_time` datetime DEFAULT NULL ,
`checkout_time` datetime DEFAULT NULL , 
`sign_hours` decimal(6,1) DEFAULT NULL ,
`overtime` decimal(6,1) DEFAULT NULL ,
`confirm_hours` decimal(6,1) DEFAULT NULL ,
`signimg` varchar(200) DEFAULT NULL ,
`signoutimg` varchar(200) DEFAULT NULL ,
`photocheck` char(2) DEFAULT NULL ,
`machine_type` varchar(2) DEFAULT '1' ,
`project_coordinate` text ,
`floor_num` varchar(200) DEFAULT NULL ,
`device_serial_no` varchar(32) DEFAULT NULL ,
KEY `checkin_time` (`checkin_time`),
KEY `worker_id` (`worker_id`),
KEY `project_id` (`project_id`),
KEY `subcontractor_id` (`subcontractor_id`),
KEY `sign_date` (`sign_date`),
KEY `project_id_2` (`project_id`,`sign_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8


B) SHOW INDEX FROM t_worker_attendance_4y

+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                  | Non_unique | Key_name         | Seq_in_index | Column_name      | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_worker_attendance_4y |          1 | checkin_time     |            1 | checkin_time     | A         |     5017494 |     NULL | NULL   | YES  | BTREE      |         |               |
| t_worker_attendance_4y |          1 | worker_id        |            1 | worker_id        | A         |     1686552 |     NULL | NULL   |      | BTREE      |         |               |
| t_worker_attendance_4y |          1 | project_id       |            1 | project_id       | A         |      102450 |     NULL | NULL   |      | BTREE      |         |               |
| t_worker_attendance_4y |          1 | subcontractor_id |            1 | subcontractor_id | A         |      380473 |     NULL | NULL   |      | BTREE      |         |               |
| t_worker_attendance_4y |          1 | sign_date        |            1 | sign_date        | A         |      512643 |     NULL | NULL   |      | BTREE      |         |               |
| t_worker_attendance_4y |          1 | project_id_2     |            1 | project_id       | A         |      102059 |     NULL | NULL   |      | BTREE      |         |               |
| t_worker_attendance_4y |          1 | project_id_2     |            2 | sign_date        | A         |     1776104 |     NULL | NULL   |      | BTREE      |         |               |
+------------------------+------------+------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+


C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sign_date >= '07/01/2018' AND sign_date < '08/01/2018' ;
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
| id | select_type | table | type | possible_keys                     | key          | key_len | ref   | rows     | Extra                    |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+
|  1 | SIMPLE      | tw    | ref  | project_id,sign_date,project_id_2 | project_id_2 | 96      | const | 54134596 | Using where; Using index |
+----+-------------+-------+------+-----------------------------------+--------------+---------+-------+----------+--------------------------+

They all went through the same joint index.

SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_300w tw 
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' 
AND sgin_date >= '07/01/2018' 
AND sgin_date < '08/01/2018' LIMIT 0,10000;

Execution time: 0.02 sec


SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1000w tw 
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' 
AND sgin_date >= '07/01/2018' 
AND sgin_date < '08/01/2018' LIMIT 0,10000;

Execution time: 0.01 sec


SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_1y tw 
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' 
AND sgin_date >= '07/01/2018' 
AND sgin_date < '08/01/2018' LIMIT 0,10000;

Execution time: 0.02 sec


SELECT tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw 
WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' 
AND sgin_date >= '07/01/2018' 
AND sgin_date < '08/01/2018' LIMIT 0,10000;

Execution time: 0.02 sec

......

My guess is that MySQL's query performance will decline dramatically with the increase of data volume, but they are not much different. So I have no way to optimize my query. I don't know when to implement table partition plan or sub-database sub-table plan.

What I want to know is why the execution speed of index with small data volume is the same as that of index with large data volume. If you can help me, I would like to thank you very much.

Haceral
  • 41
  • 7
  • 1
    this could be related to the indexes you have on the tables ..and the number of rows really involved based on where condition if you have proper index on project:_id and/or sgin_date this is not strange .. – ScaisEdge Jun 24 '19 at 11:09
  • Performance isn't solely about volume: it's volume _multiplied by the efficiency of access_. Here you are fetching the same volume (b/c of LIMIT and b/c of index scanning) and doing so in the same way (same query, same indexes = same access efficiency). – bishop Jun 24 '19 at 11:10
  • @bishop LIMIT is not always good for performance consider `LIMIT 1000000, 1000`, MySQL need to fetch 1001000 records and lose 1000000 again.. – Raymond Nijland Jun 24 '19 at 11:11
  • Right, @RaymondNijland, but that is not the case here: the OP has a zero start, that is _the most performant way to LIMIT_, and is effectively a limit on I/O. – bishop Jun 24 '19 at 11:12
  • *"but that is not the case here: the OP has a zero start, that is the most performant way to LIMIT, and is effectively a limit on I/O"* true i wanted to mention that as general case @bishop .... Topicstarter also these queries do not make a alot off sense also as SQL tables/resultsets are defined to be **orderless** by the SQL standards meaning using `LIMIT` without `ORDER BY` is pretty much **meaningless** – Raymond Nijland Jun 24 '19 at 11:15
  • 1
    like @scaisEdge says imagine if you have three books with a index, a small book with few pages, a medium book with more pages and a large book with lots of pages.. Imagine if you would search the chapter about SQL and that chapter has equal number of pages and words in all books.. Then it would take about the same amount off time to search and reading the chapters in all books right? – Raymond Nijland Jun 24 '19 at 11:22
  • 2
    `sgin_date >= '07/01/2018'` is a nonsensical request, as is LIMIT without ORDER BY. Further, as well as SHOW CREATE TABLE statements for all relevant tables, questions about query performance always require the EXPLAIN(s) for the given quer(y/ies) – Strawberry Jun 24 '19 at 12:40
  • @Haceral Please REMOVE the LIMIT xxxxx for each of the 4 queries, give us the times and number of ROWS found. To conserve everyone's time, post TEXT results of A) SHOW CREATE TABLE t_worker_attendance_4y; B) SHOW INDEX FROM t_worker_attendance_4y; and C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sgin_date >= '07/01/2018' AND sgin_date < '08/01/2018' ; – Wilson Hauck Jun 24 '19 at 15:17
  • @WilsonHauck I added what you needed in the question. In addition, if I remove the LIMIT XXXX restriction, the result sets they query are inconsistent, because these data are superimposed by me. For example, the result sets I query in 10 million are 1000 records, but I query in 20 million and get 2000 records, so I need to define one for it. Four boundaries to obtain query performance for each other in different data quantities – Haceral Jun 25 '19 at 01:29
  • @Haceral To conserve everyone's time, post TEXT results of A) SHOW CREATE TABLE t_worker_attendance_4y; B) SHOW INDEX FROM t_worker_attendance_4y; and C) EXPLAIN SELECT SQL_NO_CACHE tw.project_id, tw.sign_date FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb' AND sgin_date >= '07/01/2018' AND sgin_date < '08/01/2018' ; We still need to know the duration (seconds) and row counts when you have no limit for your 4 queries to complete, please. – Wilson Hauck Jun 25 '19 at 11:38
  • @Haceral Also what is result of this query - SELECT COUNT(*) FROM t_worker_attendance_4y tw WHERE tw.project_id = '39235664ba734887b298ee568fbb66fb'; ? – Wilson Hauck Jun 25 '19 at 11:42
  • Is there a reason you do not have a PRIMARY key? Normally the ID (first) column would be auto-incremented and be the PRIMARY key. This will eliminate the 'hidden' primary key and could use less space for index storage for your 7 indexes. – Wilson Hauck Jun 25 '19 at 20:31
  • @WilsonHauck I can't set primary keys for each table because my data comes from it itself and primary keys are not allowed to repeat. So I cancelled the primary key. – Haceral Jul 01 '19 at 08:37

2 Answers2

1

Same search performance on large data volume because of BTREE index. It has O(log(n)). Relatively speaking that means that search algorithm have to complete:

6 operations on 3m of data
7 operations on 10m of data
8 operations on 100m of data
8 operations on 400m of data

Аs you can see the number of operations is almost the same.

My guess is that MySQL's query performance will decline dramatically with the increase of data volume

This is true for full table scan cases.

Alexander Yancharuk
  • 13,817
  • 5
  • 55
  • 55
  • Thank you very much for your help. Now I know more about MySQL Index. – Haceral Jun 25 '19 at 04:33
  • I did an invalid test. I can understand that if the search data contains an index, it can return the result set quickly regardless of the amount of data. Is that so? – Haceral Jun 25 '19 at 05:40
  • 1
    Yes, search steps in btree index almost the same for any amount of data. But you can get performance drop if the index itself is not optimal. – Alexander Yancharuk Jun 25 '19 at 07:56
  • @Haceral - Finding a single row (such as the first row in that date range for that project_id) is virtually the same, as Alexander explains about BTrees. After that, it depends on how many rows need to be returned. The scan is identical, regardless of the size of the table -- "get the next sequential record in the index". – Rick James Nov 11 '19 at 22:17
  • See `SELECT index_name, stat_name FROM mysql.innodb_index_stats WHERE database_name = '?' AND table_name = '?';` for a clue of BTree depth. – Rick James Nov 11 '19 at 22:32
0

I have a new answer, someone told me "Because your query is covered by index, index is actually the time of query index. Mysql index uses B + tree structure. The query time is basically the same under the same tree height. You can calculate whether the height of the trees indexed by these tables is the same."

So I did the inquiry as required.

mysql> SELECT b.name, a.name, index_id, type, a.space, a.PAGE_NO
    -> FROM information_schema.INNODB_SYS_INDEXES a,
    -> information_schema.INNODB_SYS_TABLES b
    -> WHERE a.table_id = b.table_id AND a.space <> 0;
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| name                                            | name                | index_id | type | space | PAGE_NO |
+-------------------------------------------------+---------------------+----------+------+-------+---------+
| mysql/innodb_index_stats                        | PRIMARY             |       18 |    3 |     2 |       3 |
| mysql/innodb_table_stats                        | PRIMARY             |       17 |    3 |     1 |       3 |
| mysql/slave_master_info                         | PRIMARY             |       20 |    3 |     4 |       3 |
| mysql/slave_relay_log_info                      | PRIMARY             |       19 |    3 |     3 |       3 |
| mysql/slave_worker_info                         | PRIMARY             |       21 |    3 |     5 |       3 |
| test_gomeet/t_worker_attendance_1y              | GEN_CLUST_INDEX     |       45 |    1 |    12 |       3 |
| test_gomeet/t_worker_attendance_1y              | checkin_time        |       46 |    0 |    12 |   16389 |
| test_gomeet/t_worker_attendance_1y              | project_id          |       50 |    0 |    12 |   32775 |
| test_gomeet/t_worker_attendance_1y              | worker_id           |       53 |    0 |    12 |   49161 |
| test_gomeet/t_worker_attendance_1y              | subcontractor_id    |       54 |    0 |    12 |   65547 |
| test_gomeet/t_worker_attendance_1y              | sign_date           |       66 |    0 |    12 |   81933 |
| test_gomeet/t_worker_attendance_1y              | project_id_2        |      408 |    0 |    12 |   98319 |
| test_gomeet/t_worker_attendance_300w            | GEN_CLUST_INDEX     |       56 |    1 |    13 |       3 |
| test_gomeet/t_worker_attendance_300w            | checkin_time        |       58 |    0 |    13 |   16389 |
| test_gomeet/t_worker_attendance_300w            | project_id          |       59 |    0 |    13 |   16427 |
| test_gomeet/t_worker_attendance_300w            | worker_id           |       60 |    0 |    13 |   16428 |
| test_gomeet/t_worker_attendance_300w            | subcontractor_id    |       61 |    0 |    13 |   16429 |
| test_gomeet/t_worker_attendance_300w            | sign_date           |       67 |    0 |    13 |   65570 |
| test_gomeet/t_worker_attendance_300w            | project_id_2        |      397 |    0 |    13 |   81929 |
| test_gomeet/t_worker_attendance_4y              | GEN_CLUST_INDEX     |       42 |    1 |     9 |       3 |
| test_gomeet/t_worker_attendance_4y              | checkin_time        |       47 |    0 |     9 |   16389 |
| test_gomeet/t_worker_attendance_4y              | worker_id           |       49 |    0 |     9 |   32775 |
| test_gomeet/t_worker_attendance_4y              | project_id          |       52 |    0 |     9 |   49161 |
| test_gomeet/t_worker_attendance_4y              | subcontractor_id    |       55 |    0 |     9 |   65547 |
| test_gomeet/t_worker_attendance_4y              | sign_date           |       69 |    0 |     9 |   81933 |
| test_gomeet/t_worker_attendance_4y              | project_id_2        |      412 |    0 |     9 |   98319 |
+-------------------------------------------------+---------------------+----------+------+-------+---------+


mysql> SHOW GLOBAL STATUS LIKE 'Innodb_page_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| Innodb_page_size | 16384 |
+------------------+-------+


root@localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_300w.ibd 
000c040 0200                                   
000c042

root@localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_1y.ibd 
000c040 0300                                   
000c042

root@localhost:/usr/local/mysql/data/test_gomeet# hexdump -s 49216 -n 02 t_worker_attendance_4y.ibd 
000c040 0300                                   
000c042


The calculation shows that 3.34 is 100 million and 3.589 is 400 million. It's almost the same. Is it because of this?

Haceral
  • 41
  • 7