2

i hope you are all healthy. :-)

Currently I have a recurring problem. Our database cluster, consisting of three nodes, currently fails almost daily. The reason is repeated that one of the three nodes hangs and thus somehow hangs the whole cluster. But... we have the cluster to protect us against failures. :-(

The problem behaves in such a way that every connection attempt is timed out. I connect via ssh to each of the nodes and execute the command "mariadb" or "mysql". So far it was always the case that the command worked on 2 of 3 nodes, one node (the hanging one) is not responding. If I now restart the hanging node via "reboot -f", the cluster is healthy again after a few seconds.

A reboot without "-f" does not work because the MariaDB service cannot be stopped. Even after several hours the frozen node is not removed from the cluster.

The command "mysqlcheck -A -e" displays "OK" for all tables. So i hope that no one is corrupted.

I'm desperate about this, because the database has always been very stable. :-(

Does anyone have an idea?

Our Configuration:

  • Each server has 8 CPU cores, 32 GB RAM and runs with an SSD.
  • Ubuntu 20.04 LTS with latest updates
  • MariaDB 10.5.8
  • "wsrep_protocol_version" 10

We have two tables with 2-3 millions of data records. The other tables (about 10 more) have 1 to 60.000 data records. The database is accessed about 100 times a second.

Paulo Boaventura
  • 1,365
  • 1
  • 9
  • 29
  • 1
    This question has nothing to do with programming, therefore it is off topic here on SO. The DBA sister site of SO offers help in questions related to managing databases. – Shadow Nov 26 '20 at 22:25
  • 1
    Do `SHOW PROCESSLIST` on each node when it hangs. If you spot a query that has been running a long time, show us what the query is. – Rick James Dec 16 '20 at 00:19

0 Answers0