3

Please help!

I set up a master-slave replication based on the GTID mechanism. The replication works OK, until a mysqld restart happens on slave. Then the mess begins...

After such a restart, I can not restore the replication. When issuing a "START SLAVE" command I get the following an error message:

ERROR 1794 (HY000) at line 1: Slave is not configured or failed to initialize properly. You must at least set --server-id to enable either a master or a slave. Additional error messages can be found in the MySQL error log.

Needless to say I did set server-id in my.cnf (see below).

In /var/log/mysqld.log file, I found the following error message:

[ERROR] Error creating master info: Multiple replication metadata repository instances found with data in them. Unable to decide which is the correct one to choose.

[ERROR] Failed to create or recover replication info repository.

I can not understand what have I done wrong.

The communication between master and slave is ssl-tunneled through stunnel, but I don't think this is a relevant fact, since until a restart everything works right.

The only way I found to re-establish the replication (after mysql restart) is to manually delete the mysql data files, and then load again the dump file imported from the master. (I use mysqldump). This is of course unreasonable.

Following are the my.cnf files:

On slave:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# Recommended in standard MySQL setup
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
server-id=2
log-bin=mysql-bin
binlog_format=ROW
relay_log=relay-log
skip-slave-start
enforce-gtid-consistency
gtid-mode=ON
log-slave-updates
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

On mater:

[mysqld]
server-id=1
log-bin=mysql-bin
binlog_format=ROW
gtid-mode=on
enforce-gtid-consistency
log-slave-updates
innodb_buffer_pool_size = 1G
query_cache_size = 32M

Slave machine: Centos 6.6, mysql 5.6.24.

Master machine: RHEL 6.6, mysql 5.6.10.

Any help wold be greatly appreciated!

Thanks

Nadav Blum

Nadav
  • 963
  • 1
  • 8
  • 10
  • Could it be that MySQL can't find the my.cnf file? Or is there possibly another that it is picking up? – Jaydee Apr 28 '15 at 09:37
  • Thanks for your reply Jaydee. Well, I updated /etc/my.cnf to include "server-id=5" and restarted mysqld. Then I issued the query " show global variables like 'server%'", got indeed server_id = 5. This means that /etc/my.cnf is read by the system right ? – Nadav Apr 28 '15 at 10:55
  • I think I found the code that generates the error message. It relates to source and destination repositories? http://osxr.org/mysql/source/sql/rpl_info_factory.cc line 478 onwards. I also found this https://blog.marceloaltmann.com/en-mysql-5-6-replication-with-gtid-global-transaction-id-pt-replicacao-com-gtid/ – Jaydee Apr 28 '15 at 12:40

3 Answers3

1

on master -

mysql> reset master;

[this command will clear binary logs of master and start with new. so save it if you want.]

when you start the slave mysqld, run the following command

mysql> stop salve;
mysql> reset slave;
mysql> change master to master_host='192.168.10.116', master_user='root', master_password='root', master_auto_position=1;
mysql> start slave;
mysql> show slave status \G

Now if all goes well then, you can restart the slave (if it is committed all the transaction then no problem else it will start to execute transection in your master binary log. You can check your relay log file)

Hitesh Mundra
  • 1,538
  • 1
  • 9
  • 13
  • Thanks Hitesh. Unfortunately it still does not solve my problem. After doing "/etc/init.d/mysqld restart", issuing the command "stop slave" has the same effects as "start slave" described above - I get the same nasty error messages... – Nadav Apr 28 '15 at 12:03
  • What is the meaning of the error message " Error creating master info: Multiple replication metadata repository instances found with data in them. Unable to decide which is the correct one to choose" ? What does this mean? I think this is the main issue – Nadav Apr 28 '15 at 12:05
  • run the query- select * from mysql.slave_master_info\G & see the data directory /var/lib/mysql/master_info file. Here you will get clue. if not getting in useful info then remove master_info . change master_info table storage engine to innodb if it is myisam. – Hitesh Mundra Apr 28 '15 at 14:17
1

Well, mystery solved.

Remember how I wrote that the issue has nothing to do with my usage of stunnel, as the mean for tunneling communication between master and slave ? Well, I was wrong.

The thing is, I used localhost port 3307 as the end point for the slave communication to the master. (stunnel listened to this port and forwarded data to the master-server ip). So the "change master" was done via:

change master to master_host="localhost", master_port=3307, master_user="XXX", master_password="XXX", MASTER_AUTO_POSITION = 1;'

That "localhost" thing caused the mess. I changed it to "127.0.0.1", and now restarts cause no harm!

Thanks Hitech and Jaydee for your help!

Nadav
  • 963
  • 1
  • 8
  • 10
1

Ran into the same problem yesterday.

Oracle support doc helped.

For people who don't have Oracle support.

CAUSE

The cause is that both TABLE and FILE replication repository metadata exist at the same time,  but only one form should.

SOLUTION

Before setting up replication,  remove the files specified by the my.cnf variables relay_log_info_file and master_info_file .

By default their names map to relay-log.info and master.info and they are located in the datadir. (I had to remove the master.info file)

And remove any residual configuration by executing:

STOP SLAVE;
SET SQL_LOG_BIN=0;
DELETE FROM mysql.slave_master_info ;
DELETE FROM mysql.slave_relay_log_info ;
SET SQL_LOG_BIN=1;
viggy28
  • 760
  • 1
  • 10
  • 21