We have a MariaDB database running to record instant events where the data accumulates quite fast. There are 1000 and up sensors transmitting data into our server which inserts events into the MaxScale cluster. The cluster has one master making the insertion of the event and replicating the transactions to two slaves.
Since the events are recorded in a time series, the tables have a column called EventTime, a MySQL datetime type. There is also a column called SensorID that distinguishes the sensors and has a type of varchar(20). The data have grown to around 400 million rows in two months, and will eventually grow to around 2 billion.
We noticed that the select queries like the following are significantly slower in the slave nodes than in the master nodes:
SELECT * FROM `table0`
WHERE
(EventTime >= '2023-03-23 00:00:00' OR '2023-03-23 00:00:00' is null)
AND
(EventTime <= '2023-03-23 23:59:59' OR '2023-03-23 23:59:59' is null)
AND
(SensorID IN ('SL-1031-QL') OR COALESCE('SL-1031-QL') is null)
The explain of the above query on master and slave are the same:
MariaDB [db1000]> explain SELECT * FROM `table0` WHERE (EventTime >= '2023-03-23 00:00:00' OR '2023-03-23 00:00:00' is null) AND (EventTime <= '2023-03-23 23:59:59' OR '2023-03-23 23:59:59' is null) AND (EventID IN ('SL-1031-QL') OR COALESCE('SL-1031-QL') is null);
+------+-------------+-------+------+--------------------------+---------+---------+-------+--------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+--------------------------+---------+---------+-------+--------+------------------------------------+
| 1 | SIMPLE | table0 | ref | index_3,index_2,cindex_0 | index_2 | 62 | const | 365040 | Using index condition; Using where |
+------+-------------+-------+------+--------------------------+---------+---------+-------+--------+------------------------------------+
1 row in set (0.318 sec)
and the indexes are:
MariaDB [db1000]> show index from db1000.table0;
+-------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table0 | 0 | PRIMARY | 1 | SensorID | A | 218159126 | NULL | NULL | | BTREE | | |
| table0 | 1 | index_3 | 1 | EventTime | A | 21815912 | NULL | NULL | | BTREE | | |
| table0 | 1 | index_2 | 1 | EventID | A | 433715 | NULL | NULL | | BTREE | | |
| table0 | 1 | cindex_0 | 1 | EventTime | A | 36359854 | NULL | NULL | | BTREE | | |
| table0 | 1 | cindex_0 | 2 | EventID | A | 218159126 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
6 rows in set (0.000 sec)
We are wondering if the master node is faster because the data are already cached in its memory and therefore it does not need to fetch the data from the disk array (master SELECTs are almost all completed within 10 seconds whereas slave SELECTs range from 300 seconds to 1000 seconds). If so, what should we do to improve the slave SELECT performance?
Previous attempts to solve the efficiency issue:
We have tried to adjust innodb_buffer_pool_size
and innodb_buffer_pool_instances=8
to see if the buffer pool influences the efficiency in the slave nodes. After adjusting innodb_buffer_pool_size
on both slave nodes to around 50% of the memory, there is no observable improvement in the query time. The master node value of innodb_buffer_pool_size
is only 128M (less than 1% of its memory), and innodb_buffer_pool_instances=1
; however master node still performs significantly better than the slave nodes.
We have also tried the following two cases
query_cache_type=OFF
andquery_cache_size=0
query_cache_type=ON
andquery_cache_size=16777216
It has no observable difference in the query duration too.
It is observable that when SELECT queries are executed on the master node, there is almost no fetch of data from the disk array whereas when executed on the slave nodes there is significant network traffic from the disk array.