I have been setting up a Mesos cluster of 3 nodes(A,B,C), with Mesos Master/Slave and ZooKeeper processes running in each Docker container.
Since cluster setup including docker run
is executed using Ansible,
there should be no difference between 3 nodes except node-specific configurations(hostname, zookeeper_myid, etc).
Problems are...
Zookeeper Warning on node A
Zookeeper shows following message only on node A.
2015-05-25 03:28:06,060 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /<ip-nodeA>:58391
2015-05-25 03:28:06,060 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /<ip-nodeA>:58391; will be dropped if server is in r-o mode
2015-05-25 03:28:06,060 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@841] - Refusing session request for client /<ip-nodeA>:58391 as it has seen zxid 0x44 our last zxid is 0xc client must try another server
2015-05-25 03:28:06,060 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /<ip-nodeA>:58391 (no session established for client)
Zookeeper on node B shows following messages.
2015-05-25 03:12:18,594 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /<ip-nodeB>:42784 which had sessionid 0x14d89037c1e0000
2015-05-25 03:12:30,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@347] - Expiring session 0x14d89037c1e0000, timeout of 10000ms exceeded
2015-05-25 03:12:30,001 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14d89037c1e0000
2015-05-25 03:12:30,987 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /<ip-nodeB>:42853
2015-05-25 03:12:30,987 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /<ip-nodeB>:42853; will be dropped if server is in r-o mode
2015-05-25 03:12:30,988 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /<ip-nodeB>:42853
2015-05-25 03:12:30,997 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617] - Established session 0x14d89037c1e0002 with negotiated timeout 10000 for client /<ip-nodeB>:42853
Zookeeper on node C shows following messages.
2015-05-25 03:12:31,183 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /<ip-nodeA>:56496
2015-05-25 03:12:31,184 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /<ip-nodeA>:56496; will be dropped if server is in r-o mode
2015-05-25 03:12:31,184 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /<ip-nodeA>:56496
2015-05-25 03:12:31,191 [myid:] - INFO [SyncThread:0:ZooKeeperServer@617] - Established session 0x14d89037ccd0002 with negotiated timeout 10000 for client /<ip-nodeA>:56496
"No master is currently leading..." on node B
Node C is elected for master. Accessing mesos admin page on node A is successfully redirected to node C.
But it doesn't redirect node B to node C, showing "No master is currently leading..." instead.
Only 2 of 3 slaves detected by master node
On master node (currently node C), 2 of 3 slaves are detected. 2 detected slaves are; node A and C
Then, what is the possible cause of these problems?
OS: CentOS 6.5
Docker Images:
- Mesos Master: redjack/mesos-master
- Mesos Slave: redjack/mesos-slave
- ZooKeeper: digitalwonderland/zookeeper
Docker versions:
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.3.3
Git commit (client): a8a31ef/1.5.0
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.3.3
Git commit (server): a8a31ef/1.5.0