3

I restarted MySQL server in one of the nodes in Percona Cluster. Since the restart took much time, I interrupted the process. I tried restarting the MySQL server again. I got the following error:

Stale sst_in_progress file in datadir

I followed this link, https://www.percona.com/forums/questions-discussions/percona-xtradb-cluster/46846-sql-cluster-issue-need-help-please, and deleted the sst_in_progress file as mentioned in it.

Now, when I try restarting the MySQL server, I am getting this:

● mysql.service - LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon
Loaded: loaded (/etc/init.d/mysql; bad; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2018-03-14 11:04:07 IST; 16min ago
 Docs: man:systemd-sysv-generator(8)
 Process: 23568 ExecStart=/etc/init.d/mysql start (code=exited, status=1/FAILURE)

Mar 14 11:04:00 systemd[1]: Starting LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Mar 14 11:04:00 mysql[23568]:  * Starting MySQL (Percona XtraDB Cluster) database server mysqld
Mar 14 11:04:00 /etc/init.d/mysql[23614]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Mar 14 11:04:07 mysql[23568]:  * The server quit without updating PID file (/var/run/mysqld/mysqld.pid).
Mar 14 11:04:07 mysql[23568]:    ...fail!
Mar 14 11:04:07 systemd[1]: mysql.service: Control process exited, code=exited status=1
Mar 14 11:04:07 systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
Mar 14 11:04:07 systemd[1]: mysql.service: Unit entered failed state.
Mar 14 11:04:07 systemd[1]: mysql.service: Failed with result 'exit-code'.

One more thing is that no log is getting written to mysql-error.log file during restart because of which I am not able to continue debugging.

Kanmaniselvan
  • 522
  • 1
  • 8
  • 23
  • "SST" means that it is copying all the data from another node. I would think it risky to interrupt that. "IST" ("Incremental") is much faster, but apparently the Cluster decide it could not do it that way. – Rick James Mar 20 '18 at 13:33

2 Answers2

2

The best solution here, without being able to see more information, is to simply rm -rf $datadir and start the node back up. It will indeed SST which, depending on the dataset, will take a while. Estimate 1 hour for every 100GB of data over gigE.

utdrmac
  • 731
  • 5
  • 17
0

If joiner node taking too much time so you can increase gcache.size upto 1 GB then restart joiner node, so if data is already there on new node so it will choose IST instead of SST.

set in my.cnf

wsrep_provider_options="gcache.size=1G"