1

We have a MariaDB database running to record instant events where the data accumulates quite fast. There are 1000 and up sensors transmitting data into our server which inserts events into the MaxScale cluster. The cluster has one master making the insertion of the event and replicating the transactions to two slaves.

Since the events are recorded in a time series, the tables have a column called EventTime, a MySQL datetime type. There is also a column called SensorID that distinguishes the sensors and has a type of varchar(20). The data have grown to around 400 million rows in two months, and will eventually grow to around 2 billion.

We noticed that the select queries like the following are significantly slower in the slave nodes than in the master nodes:

SELECT * FROM `table0` 
WHERE 
    (EventTime >= '2023-03-23 00:00:00' OR '2023-03-23 00:00:00' is null) 
AND 
    (EventTime <= '2023-03-23 23:59:59' OR '2023-03-23 23:59:59' is null) 
AND 
    (SensorID IN ('SL-1031-QL') OR COALESCE('SL-1031-QL') is null) 

The explain of the above query on master and slave are the same:

MariaDB [db1000]> explain SELECT * FROM `table0` WHERE (EventTime >= '2023-03-23 00:00:00' OR '2023-03-23 00:00:00' is null) AND (EventTime <= '2023-03-23 23:59:59' OR '2023-03-23 23:59:59' is null) AND (EventID IN ('SL-1031-QL') OR COALESCE('SL-1031-QL') is null);
+------+-------------+-------+------+--------------------------+---------+---------+-------+--------+------------------------------------+
| id   | select_type | table | type | possible_keys            | key     | key_len | ref   | rows   | Extra                              |
+------+-------------+-------+------+--------------------------+---------+---------+-------+--------+------------------------------------+
|    1 | SIMPLE      | table0   | ref  | index_3,index_2,cindex_0 | index_2 | 62      | const | 365040 | Using index condition; Using where |
+------+-------------+-------+------+--------------------------+---------+---------+-------+--------+------------------------------------+
1 row in set (0.318 sec)

and the indexes are:

MariaDB [db1000]> show index from db1000.table0;
+-------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table0   |          0 | PRIMARY  |            1 | SensorID         | A         |   218159126 |     NULL | NULL   |      | BTREE      |         |               |
| table0   |          1 | index_3  |            1 | EventTime | A         |    21815912 |     NULL | NULL   |      | BTREE      |         |               |
| table0   |          1 | index_2  |            1 | EventID          | A         |      433715 |     NULL | NULL   |      | BTREE      |         |               |
| table0   |          1 | cindex_0 |            1 | EventTime | A         |    36359854 |     NULL | NULL   |      | BTREE      |         |               |
| table0   |          1 | cindex_0 |            2 | EventID          | A         |   218159126 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
6 rows in set (0.000 sec)

We are wondering if the master node is faster because the data are already cached in its memory and therefore it does not need to fetch the data from the disk array (master SELECTs are almost all completed within 10 seconds whereas slave SELECTs range from 300 seconds to 1000 seconds). If so, what should we do to improve the slave SELECT performance?

Previous attempts to solve the efficiency issue:

We have tried to adjust innodb_buffer_pool_size and innodb_buffer_pool_instances=8 to see if the buffer pool influences the efficiency in the slave nodes. After adjusting innodb_buffer_pool_size on both slave nodes to around 50% of the memory, there is no observable improvement in the query time. The master node value of innodb_buffer_pool_size is only 128M (less than 1% of its memory), and innodb_buffer_pool_instances=1; however master node still performs significantly better than the slave nodes.

We have also tried the following two cases

  • query_cache_type=OFF and query_cache_size=0
  • query_cache_type=ON and query_cache_size=16777216

It has no observable difference in the query duration too.

It is observable that when SELECT queries are executed on the master node, there is almost no fetch of data from the disk array whereas when executed on the slave nodes there is significant network traffic from the disk array.

  • 1
    Pls post the EPLAIN of your query fromm all servers Master and both slaves) – Bernd Buffen Mar 24 '23 at 07:12
  • 2
    please edit your post to show (as text, not images) output of `show create table table0;` and `select version();` and `explain select ... rest of your query` (for both master and slave) – ysth Mar 24 '23 at 07:25
  • The result of EXPLAIN from master and slave are identical. We have pasted them in the edit. Thanks. – Chi-Hsiu Liang Mar 24 '23 at 07:58
  • Please post both complete EXPLAIN's in TEXT form for us to confirm your conclusion they are the same. We also need complete TEXT results posted for SHOW CREATE TABLE table0; from Master and Slave to allow index confirmation. If they are identical you need to OPTIMIZE TABLE table0; on both Master and Slave to ensure indexes are current. – Wilson Hauck Mar 25 '23 at 00:18
  • Does the slowdown happen even if you bypass MaxScale and go directly to the database? If so, the problem is unlikely to be related to MaxScale. – markusjm Mar 27 '23 at 09:09
  • We are thinking the slowdown is due to the SAN connection through the 1Gbps network. Although it is connected to a disk array the latency causes low efficiency. I noticed that Netdata monitor shows that when there are 100 MiB/s read of data flow, the disk read data flow is only around 10MiB/s. This might mean TCP is only giving me 10% of the read data value wasting the TCP network traffic. We are trying to migrate the data to SSD on the host to see whether we can get better efficiency. – Chi-Hsiu Liang Mar 28 '23 at 07:02
  • Please provide `SHOW CREATE TABLE`. I am likely to have suggestions on improving the speed on both Primary and Replica. – Rick James Mar 31 '23 at 19:57
  • Can `COALESCE('SL-1031-QL') is null` ever be true? – Rick James Mar 31 '23 at 20:00
  • What version of MySQL/MariaDB on each server? – Rick James Mar 31 '23 at 20:00
  • Don't use the tiny (old) default of 128M. – Rick James Mar 31 '23 at 20:01
  • How can `Sensor_ID` be unique?? – Rick James Mar 31 '23 at 20:03

1 Answers1

0

The data were situated on a SAN via 1Gbps ethernet. After we move the data to the local SSD, the efficiency is much better now.

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Apr 08 '23 at 21:41