20

I'm having great success so far using Mesos, Marathon, and Docker to manage a fleet of servers, and the containers I'm placing on them. However, I'd now like to go a bit further and start doing things like automatically linking an haproxy container to each main docker service that starts, or provide other daemon based and containerized services that are linked and only available to the single parent container.

Normally, I'd start up the helper service first with some name, then when I started the real service, I'd link it to the helper and everything would be fine. How does this model fit in to Marathon and Mesos though? It seems for now at least that the containerization assumes a single container.

I had one idea to start the helper service first, on whatever host it could find, then add a constraint to the real service that the hostname = helper service's hostname, but that seems like it'd cause issues with resource offers and race conditions for those resources.

I've also thought to provide an "embed", or "deep-link" functionality to docker, or to the executor scripts that start the docker containers.

Before I head down any of these paths, I wanted to find out if someone else had solved this problem, or if I was just horribly over thinking things.

Thanks!

William Thurston
  • 405
  • 4
  • 11

1 Answers1

26

you're wandering in uncharted territory! ☺

There are multiple approaches here; and none of them is perfect, but the situation will improve in future versions of Docker, thanks to orchestration hooks.

One way is to use good old service discovery and registration. I.E., when a service starts, it will figure out its publicly available address, and register itself in e.g. Zookeeper, Etcd, or even Redis. Since it's not trivial for a service to figure out its publicly available address (unless you adopt some conventions, e.g. always mapping port X:X instead of letting Docker assing random ports), you might want to do the registration from outside. That means that your orchestration layer (Mesos in that case) would start the container, then figure out the host and port, and put that in your service discovery system. I'm not extremely familiar with Marathon, but you should be able to register a hook for that. Then, other containers will just look up the endpoint address in the service discovery registry, plain and simple.

You could also look at Skydock, which automatically registers DNS names for your containers with Skydns. However, it's currently single-host, so if you like that idea, you'll have to extend it somehow to support multiple hosts, and maybe SRV records.

Another approach is to use "well-known entry points". This is actually is simplified case of service discovery. It means that you will make sure that your services will always run on pre-set hosts and ports, so that you can use those addresses statically. Of course, this is bad (because it will make your life harder when you will want to reproduce the environment for testing/staging purposes), but if you have no clue at all about service discovery, well, it could be a start.

You could also use Pipework to create one (or multiple) virtual network spanning across multiple hosts, and binding your containers together. Pipework will let you assign IP addresses manually, or automatically through DHCP. This approach is not recommended, though, but it's a good fit if you also want to plug your containers into an existing network architecture (e.g. VLANs...).

No matter which solution you decide to use, I highly recommend to "pretend" that you're using links. I.e. instead of hard-coding your app configuration to connect to (random example) my-postgresql-db:5432, use environment variables DB_PORT_5432_TCP_ADDR and DB_PORT_5432_TCP_PORT (as if it were a link), and set those variables when starting the container. That way, if you "fold down" your containers into a simpler environment without service discovery etc., you can easily fallback on links without efforts.

jpetazzo
  • 14,874
  • 3
  • 43
  • 45
  • 2
    So one thing that is important to me is the locality of containers. I could relatively easily have some sort of discovery to find other containers, but I'm looking for an optimization beyond that which forces marathon to start up a group of containers(think nginx + rails + credentials daemon) each in their own containers. Obviously I need to submit a patch, should I look at Marathon rather than docker for this? – William Thurston Feb 27 '14 at 07:26
  • Just to make sure that I understand properly — did you mean "…which forces marathon to start up a group of containers (…) each in their own host"? I.e. force containers to be on different hosts? – jpetazzo Feb 27 '14 at 17:49
  • Sorry, I want the group on a single host, a local haproxy attached to each service for example. I think the proper path here is a custom mesos executor that knows how to take additional arguments from marathon, fork itself, and manage children making sure the entire group is restarted if any one piece dies. – William Thurston Mar 01 '14 at 15:07
  • Understood. Well, to be honest I'm not enough knowledgeable about Marathon/Mesos here. I don't know if an existing job can be used as a resource, so that you can tell "this backend service counts as providing one unit of resource of type *backend*; now please start one haproxy service somewhere, knowing that it requires one unit of type *backend* to run (as well as X and Y units of e.g. RAM and CPU)". – jpetazzo Mar 11 '14 at 22:30
  • why not group the 2 apps, the second should depend on the first to be healthy and then, you link them by using consul (+ consul-marathon) service discovery. In the second you just specify the app1.service.consul – ady8531 Mar 15 '18 at 07:57