Mesos+ZooKeeper don't work fine

Question

I have been setting up a Mesos cluster of 3 nodes(A,B,C), with Mesos Master/Slave and ZooKeeper processes running in each Docker container.

Since cluster setup including docker run is executed using Ansible, there should be no difference between 3 nodes except node-specific configurations(hostname, zookeeper_myid, etc).

Problems are...

Zookeeper Warning on node A

Zookeeper shows following message only on node A.

2015-05-25 03:28:06,060 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /<ip-nodeA>:58391
2015-05-25 03:28:06,060 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /<ip-nodeA>:58391; will be dropped if server is in r-o mode
2015-05-25 03:28:06,060 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@841] - Refusing session request for client /<ip-nodeA>:58391 as it has seen zxid 0x44 our last zxid is 0xc client must try another server
2015-05-25 03:28:06,060 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /<ip-nodeA>:58391 (no session established for client)

Zookeeper on node B shows following messages.

2015-05-25 03:12:18,594 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /<ip-nodeB>:42784 which had sessionid 0x14d89037c1e0000
2015-05-25 03:12:30,000 [myid:] - INFO  [SessionTracker:ZooKeeperServer@347] - Expiring session 0x14d89037c1e0000, timeout of 10000ms exceeded
2015-05-25 03:12:30,001 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x14d89037c1e0000
2015-05-25 03:12:30,987 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /<ip-nodeB>:42853
2015-05-25 03:12:30,987 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /<ip-nodeB>:42853; will be dropped if server is in r-o mode
2015-05-25 03:12:30,988 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /<ip-nodeB>:42853
2015-05-25 03:12:30,997 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - Established session 0x14d89037c1e0002 with negotiated timeout 10000 for client /<ip-nodeB>:42853

Zookeeper on node C shows following messages.

2015-05-25 03:12:31,183 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /<ip-nodeA>:56496
2015-05-25 03:12:31,184 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /<ip-nodeA>:56496; will be dropped if server is in r-o mode
2015-05-25 03:12:31,184 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /<ip-nodeA>:56496
2015-05-25 03:12:31,191 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - Established session 0x14d89037ccd0002 with negotiated timeout 10000 for client /<ip-nodeA>:56496

"No master is currently leading..." on node B

Node C is elected for master. Accessing mesos admin page on node A is successfully redirected to node C.

But it doesn't redirect node B to node C, showing "No master is currently leading..." instead.

Only 2 of 3 slaves detected by master node

On master node (currently node C), 2 of 3 slaves are detected. 2 detected slaves are; node A and C

Then, what is the possible cause of these problems?

OS: CentOS 6.5

Docker Images:

Mesos Master: redjack/mesos-master
Mesos Slave: redjack/mesos-slave
ZooKeeper: digitalwonderland/zookeeper

Docker versions:

Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.3.3
Git commit (client): a8a31ef/1.5.0
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.3.3
Git commit (server): a8a31ef/1.5.0

Try getting mesos to work with a single node zookeeper first. For some help take a look at: http://stackoverflow.com/questions/25217208/setting-up-a-docker-fig-mesos-environment/25218202#25218202 — Mark O'Connor, May 25 '15 at 19:56
Thank you. I tried with a single ZooKeeper node and 3 master&slave nodes. It works fine. A and C are redirected to the master node B, and the master detects all 3 slave nodes. So is it ZooKeeper configuration problem? — ai0307, May 26 '15 at 01:01
Yes, certainly looks like a problem establishing your zookeeper ensemble — Mark O'Connor, May 27 '15 at 17:33
I will work for setting up zookeeper ensemble. Are there any ways to check if an ensemble is working(not just living), except Mesos Web UI? — ai0307, May 28 '15 at 00:38
I can highly recommend Netflix Exhibitor (https://github.com/Netflix/exhibitor) to supervise and establish the ZK ensemble. All it requires is a NFS or S3 share for it's shared configuration. No messing around with zoo.cfg, myid, etc. It will start/restart ZK instances as needed, create backups and perform clean up and you can just resize your ZK cluster on the fly. It's really neat. — lloesche, Jun 03 '15 at 10:26

Mesos+ZooKeeper don't work fine

0 Answers0