3

Just been reading up on Docker overlay networks, very cool stuff. I just can't seem to find an answer to one thing.

According to the docs:

  • If you install and use Docker Swarm, you get overlay networks across your manager/worker hosts automagically, and don't need to configure anything more; but...
  • If you simply want a (non-Swarm) overlay network across multiple hosts, you need to configure that network with an external "KV Store" (consensus server) like Consul or ZooKeeper

I'm wondering why this is. Clearly, overlay networks require consensus amongst peers, but I'm not sure why or who those "peers" even are.

And I'm just guessing that, with Swarm, there's some internal/under-the-hood consensus server running out of the box.

halfer
  • 19,824
  • 17
  • 99
  • 186
smeeb
  • 27,777
  • 57
  • 250
  • 447

1 Answers1

5

Swarm Mode uses Raft for it's manager consensus with a built-in KV store. Before swarm mode, overlay networking was possible with third party KV stores. Overlay networking itself doesn't require consensus, it just relies on whatever the KV store says regardless of the other nodes or even it's own local state (I've found this out the hard way). The KV stores out there are typically setup with consensus for HA.

The KV store tracks IP allocations to containers running on each host (IPAM). This allows docker to only allocate a given address once, and to know which docker host it needs to communicate with when you connect to a container running on another host. This needs to be external from any one docker host, and preferably in an HA configuration (like swarm mode's consensus) so that it can continue to work even when some docker nodes are down.

Overlay networking between docker nodes only involves the nodes that have containers on that overlay network. So once the IP is allocated and discovered, all the communication only happens between the nodes with the relevant containers. This is easy to see with swarm mode if you create a network and then list networks on a worker, it won't be there. Once a container on that network gets scheduled, the network will appear. From docker, this reduces overhead of multi-host networking while also adding to the security of the architecture. The result looks like this graphic:

Docker multi-host networking

The raft consensus itself is only needed for leader election. Once a node is selected to be the leader and enough nodes remain to have consensus, only one node is writing to the KV store and maintaining the current state. Everyone else is a follower. This animation describes it better than I ever could.

Lastly, you don't need to setup an external KV store to use overlay networking outside of swarm mode services. You can implement swarm mode, configure overlay networks with the --attachable option, and run containers outside of swarm mode on that network as you would have with an external KV store. I've used this in the past as a transition state to get containers into swarm mode, where some were running with docker-compose and others had been deployed as a swarm stack.

BMitch
  • 231,797
  • 42
  • 475
  • 450
  • Thanks @BMitch (+1) - you've helped me countless times over at DIY.SE, didn't realize you were *the* Docker guy, dang! Everything you said makes sense to my uninitiated ears, except the two middle paragraphs. I **think** I'm hearing that only Docker hosts that are hosting containers with overlay networks are considered members in this consensus equation (**yes?**). And if thats the case, then each of these "hosting-containers-with-overlay nodes" form the members/peers, and that they all share info about which containers have which IP addresses... – smeeb Jun 14 '17 at 22:12
  • And the *leader* is whichever one of these nodes gets elected via Raft. And that when DNS goes to resolve container name to an IP inside the overlay, the leader is the lucky winner who gets consulted for this info. If the leader goes down, any of the other nodes can become the leader. Am I understanding you correctly or am I way off in left field? And thanks again! – smeeb Jun 14 '17 at 22:14
  • @smeeb your question made me realize that I didn't clarify one important point. As best I understand it there's no consensus within overlay itself, it just treats the KV as the source of truth and it's up to you to implement consensus on the KV for HA. The nodes in an overlay network delegate all knowledge of the network to that KV to avoid IP collisions and other issues. I also added a link on Raft that I hope you'll find helpful. The consensus of Raft and design of Overlay are two different topics that meet in the middle. – BMitch Jun 14 '17 at 23:30