10

I'm trying to figure out a proper way to implement active/passive failover between replicas of service with Docker swarm mode.

The service will hold a valuable in-memory state that cannot be lost, that's why I need multiple replicas of it. The replicas will internally implement Raft so that only the replica which is active ("leader") at a given moment will accept requests from clients.

(If you're unfamiliar with Raft: simply put, it is a distributed consensus algorithm, which helps implement active/passive fault-tolerant cluster of replicas. According to Raft, the active replica - the leader - replicates changes in its data to passive replicas - the followers. The only leader accepts requests from clients. If the leader fails, a new leader is elected among the followers).

As far as I understand, Docker will guarantee that a specified number of replicas are up and running, but it will balance incoming requests among all of the replicas, in the active/active manner.

How can I tell Docker to route requests only to the active replica, but still guarantee that all replicas are up?

One option is routing all requests through an additional NGINX container, and updating its rules each time a new leader is elected. But that will be an additional hop, which I'd like to avoid.

I'm also trying to avoid external/overlapping tools such as consul or kubernetes, in order to keep the solution as simple as possible. (HAProxy is not an option because I need a Linux/Windows portable solution). So currently I'm trying to understand if this can be done with Docker swarm mode alone.

Another approach I came across is returning a failing health check from passive replicas. It does the trick with kubernetes according to this answer, but I'm not sure it will work with Docker. How does the swarm manager interpret failing health checks from task containers?

I'd appreciate any thoughts.

Nmk
  • 1,281
  • 2
  • 14
  • 25
felix-b
  • 8,178
  • 1
  • 26
  • 36
  • @ felix-b! How did you manage to resolve this problem (if you did)? – Saqib Ahmed Sep 29 '18 at 20:08
  • @SaqibAhmed, I opted to use Kubernetes, and I simplified my problem by trading in-memory replicas for persistent event logs with snapshots. Thus I have only one replica, and if it crashes, the new instance would restore its state from the latest snapshot plus events from the log. A kind of event sourcing. If I'd still need multiple replicas, I'd probably attempt to do it on top of Kubernetes (the following discussion might be relevant: https://github.com/kubernetes/kubernetes/issues/45300) – felix-b Sep 30 '18 at 09:57

1 Answers1

0

Active Passive replica can be achieved by having below deployment mode:

mode: global

With this port of the corresponding service is open, i.e., service is accessible via any the nodes in the swarm, but container will be running only on particular node.

Ref: https://docs.docker.com/compose/compose-file/#mode

Example: VAULT-HA with Consul Backend docker stack file: https://raw.githubusercontent.com/gtanand1994/VaultHA/master/docker-compose.yml

Here, Vault and Nginx containers will be seen only in one node in the swarm, but Consul containers (having mode: replicated) will be present on all the nodes of swarm. But as I said before, VAULT, and NGINX services are available via 'any_node_ip:corresponding_port_number'

Anand.G.T
  • 61
  • 6