6

I am writing a web app based on Docker Compose, and it has a number of background systems - an HTTP API, a HTTP proxy and a queue. Everything is in PHP in Alpine containers, either PHP 5.6 or 7.0.

I originally set up the queue in Supervisor inside the API container, which worked fine. However Supervisor/Python makes containers much fatter than they should be (80M rather than 25M) so I moved the queue to its own container. It lives for ~5 minutes and quits in order to be restarted, and I was using the auto-restart system in Supervisor, so I have swapped to the restart system in Docker Compose. I am using version 2 of the Compose YAML format.

When the queue starts up, it renders a simple message to stdout:

queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)

This is fine when I do docker-compose up initially. However, for every restart, I get three of these messages, then five, and so on, with no limit. If I do docker ps then it indicates there is only one queue running:

halfer@machine:~/proximate-app$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
a9c94558769d        proximate-app       "/tmp/container-st..."   2 hours ago         Up 2 hours          0.0.0.0:8084->8084/tcp   app_instance
7e48d6aec459        proximate-api       "sh /tmp/bin/web-s..."   2 hours ago         Up 2 hours          0.0.0.0:8080->8080/tcp   api_instance
86c564becadf        proximate-queue     "sh /var/app/bin/c..."   2 hours ago         Up About a minute                            queue_instance
20c2145f80e4        proximate-proxy     "sh /var/proxy/con..."   2 hours ago         Up 2 hours          0.0.0.0:8081->8081/tcp   proxy_instance

Here is my Compose file:

version: '2'
services:

  proximate-app:
    container_name: "app_instance"
    image: proximate-app
    ports:
    - "8084:8084"
    links:
    - proximate-api

  # @todo Remove external ports once everything is happy
  proximate-api:
    container_name: "api_instance"
    image: proximate-api
    ports:
    - "8080:8080"
    links:
    - proximate-proxy
    - proximate-queue
    volumes:
    - proximate-volume:/remote/cache
    - proximate-q-volume:/remote/queue
    # Use time and TZ from the host, could alternatively use env vars and set it
    # manually in the container, see https://wiki.alpinelinux.org/wiki/Setting_the_timezone
    - /etc/localtime:/etc/localtime:ro
    - /etc/timezone:/etc/timezone:ro
    # Should perhaps pass this as a var to docker-compose so as not to hardwire it,
    # but it is fine for now
    environment:
    - PHP_TIMEZONE=Europe/London

  proximate-queue:
    container_name: "queue_instance"
    image: proximate-queue
    restart: always
    links:
    - proximate-proxy
    volumes:
    - proximate-volume:/remote/cache
    - proximate-q-volume:/remote/queue
    environment:
    - PROXY_ADDRESS=proximate-proxy:8081

  # @todo Remove external ports once everything is happy
  proximate-proxy:
    container_name: "proxy_instance"
    image: proximate-proxy
    ports:
    - "8081:8081"
    volumes:
    - proximate-volume:/remote/cache
    environment:
    - PROXY_LOG_PATH=/remote/cache/proxy.log

volumes:
  proximate-volume:
  proximate-q-volume:

The relevant container is proximate-queue.

I am pretty sure that my container itself is not responsible for this oddity. My Dockerfile enters thus:

ENTRYPOINT ["sh", "/var/app/bin/container-start.sh"]

And that just calls a launch script:

#!/bin/sh

php \
    /var/app/bin/queue.php \
    --queue-path /remote/queue \
    --proxy-address ${PROXY_ADDRESS}

Which runs a queue process:

#!/usr/bin/env php
<?php

use Proximate\Service\File;
use Proximate\Service\SiteFetcher as SiteFetcherService;
use Proximate\Queue\Read as QueueReader;

$root = realpath(__DIR__ . '/..');
require_once $root . '/vendor/autoload.php';

$actions = getopt('p:q:', ['proxy-address:', 'queue-path:']);

$queuePath = isset($actions['queue-path']) ? $actions['queue-path'] : (isset($actions['q']) ? $actions['q'] : null);
$proxyAddress = isset($actions['proxy-address']) ? $actions['proxy-address'] : (isset($actions['p']) ? $actions['p'] : null);

if (!$queuePath || !$proxyAddress)
{
    $command = __FILE__;
    die(
        sprintf("Syntax: %s --proxy-address <proxy:port> --queue-path <queue-path>\n", $command)
    );
}

if (!file_exists($queuePath))
{
    die(
        sprintf("Error: the supplied queue path `%s` does not exist\n", $queuePath)
    );
}

echo sprintf(
    "Starting queue watcher (path=%s, proxying to %s)\n",
    $queuePath,
    $proxyAddress
);

$queue = new QueueReader($queuePath, new File());
$queue->
    setFetcher(new SiteFetcherService($proxyAddress))->
    process();

As you can see, the echo sprintf() is what announces the start-up, and there's nothing like loops that could be doing this on my side. Could this be a bug in Docker Compose? I am using docker-compose version 1.9.0, build 2585387 on Ubuntu 14.04.

For reference, the Docker Compose stdout looks like this (the repeated lines from the queue are visible):

halfer@machine:~/proximate-app$ docker-compose up
Creating network "proximateapp_default" with the default driver
Creating proxy_instance
Creating queue_instance
Creating api_instance
Creating app_instance
Attaching to proxy_instance, queue_instance, api_instance, app_instance
proxy_instance     | Teeing proxy logs also to /remote/cache/proxy.log
proxy_instance     | [2017-05-10 09:18:42] stdout.INFO: Setting up queue at `/remote/cache/data` [] []
proxy_instance     | [2017-05-10 09:18:42] stdout.INFO: Starting proxy listener on 172.18.0.2:8081 [] []
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
api_instance       | PHP 7.0.16 Development Server started at Wed May 10 10:19:00 2017
app_instance       | PHP 5.6.29 Development Server started at Wed May 10 09:19:10 2017
app_instance       | PHP 5.6.29 Development Server started at Wed May 10 09:19:10 2017
queue_instance exited with code 0
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
queue_instance exited with code 0
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)
queue_instance     | Starting queue watcher (path=/remote/queue, proxying to proximate-proxy:8081)

One thing I could try is to get the app to sleep and do nothing else, to prove that some weird exit handler or something is not causing havoc. However, I expect that will do exactly the same thing.

Update

I have replaced the queue with a script that prints time information then sleeps for 20 seconds. This is what I get:

halfer@machine:~/proximate-app$ docker-compose up
Creating network "proximateapp_default" with the default driver
Creating proxy_instance
Creating queue_instance
Creating api_instance
Creating app_instance
Attaching to proxy_instance, queue_instance, api_instance, app_instance
proxy_instance     | Teeing proxy logs also to /remote/cache/proxy.log
proxy_instance     | [2017-05-10 11:51:17] stdout.INFO: Setting up queue at `/remote/cache/data` [] []
proxy_instance     | [2017-05-10 11:51:17] stdout.INFO: Starting proxy listener on 172.18.0.2:8081 [] []
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:51:27 +0000. Microtime=1494417087.107185
api_instance       | PHP 7.0.16 Development Server started at Wed May 10 12:51:37 2017
app_instance       | PHP 5.6.29 Development Server started at Wed May 10 11:51:46 2017
app_instance       | PHP 5.6.29 Development Server started at Wed May 10 11:51:46 2017
queue_instance exited with code 0
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:51:27 +0000. Microtime=1494417087.107185
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:51:55 +0000. Microtime=1494417115.178871
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:52:22 +0000. Microtime=1494417142.409513
queue_instance exited with code 0
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:51:27 +0000. Microtime=1494417087.107185
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:51:55 +0000. Microtime=1494417115.178871
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:52:22 +0000. Microtime=1494417142.409513
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:52:49 +0000. Microtime=1494417169.612523
queue_instance     | Hello everyone! Time=Wed, 10 May 2017 11:53:17 +0000. Microtime=1494417197.826749

That seems to show that:

  • Restarts are reported only every other restart
  • My task will take exactly 20 seconds, but Compose is restarting them quite sluggishly (closer to every 30 seconds). That doesn't bother me too much, but may be a useful observation
  • Where lines are being repeated, they are in fact old restart reports.

What is going on?

halfer
  • 19,824
  • 17
  • 99
  • 186
  • How the serie continues? 1, 3, 5... – Robert May 10 '17 at 11:48
  • @Robert, yes that's right. I have another observation which I will add to the post in a bit, based on a very stripped-down example (basically an `echo` and a `sleep`). In a nutshell: (1) if the time interval is 20 seconds, the restarts are only reported every 40 seconds, and (2) for the first reported restart (3 output lines) these are in fact the first, second and third restart lines being rendered in one go. – halfer May 10 '17 at 11:55
  • What does your test script do? If it exits, it will be started again due to the `restart: always` in the docker-compose.yml – Robert May 14 '17 at 16:25
  • 1
    The cumulative output is due to the container is not being removed each time it exits. So, docker-compose start it again showing you the `docker logs -f` from the same container that is starting again – Robert May 14 '17 at 16:32
  • @Robert: to answer the first question, the simple test script in the update prints time information, sleeps for twenty seconds, and then exits. – halfer May 14 '17 at 17:33
  • In relation to your last question, I wonder if I need to improve my understanding of Docker terminology. As I understand it, a container is an image plus an overlay of the current filesystem state (and can either be running or not running). So, once the container has done one run, what I think you are saying is that the container is not recreated from the image on each restart. In my design it does not matter either way, so can I ask Docker to recreate a fresh container, so I only see new logs? – halfer May 14 '17 at 17:37
  • One that is sorted, I would then be interested to know why Compose is rendering every other log. That feels like a bug to me - why do I see all logs every ~40 seconds (with two additional restarts per output) rather than every ~20 seconds (with one additional restart per output)? – halfer May 14 '17 at 17:39
  • I found in docker --rm docs: "By default a container’s file system persists even after the container exits. This makes debugging a lot easier (since you can inspect the final state) and you retain all your data by default. But if you are running short-term foreground processes, these container file systems can really pile up. If instead you’d like Docker to automatically clean up the container and remove the file system when the container exits, you can add the --rm flag" Is it possible for you to have a long-lived container? Maybe with a while true: ..... sleep X. – Robert May 14 '17 at 19:29

2 Answers2

4

I sum up what we discovered.

The cumulative output is due to the container is not being removed each time it exits. So, docker-compose starts it again showing you the docker logs -f from the same container that is starting again. That is working as expected.

I didn't find a way to get docker-compose removing a container before start it again inside the same docker-compose up

I like your queue-based architecture, but I think that it would be better if you use a long-lived container instead of one container that starts/exits/starts/exits... You can accomplish it doing a simple while true; do ./queue-app-blabla.php; sleep 20; done in a shell script.

Anyone that need to reproduce this, can use the following docker-compose.yml

version: "2"

services:
  ui:
    image: ubuntu
    command: tail -f /dev/null
    links:
      - queue
  queue:
    image: ubuntu
    command: sh -c "date; echo Sleeping...; sleep 20; echo exit."
    restart: always

Output:

queue_1  | Mon May 15 04:32:12 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
docker_queue_1 exited with code 0
queue_1  | Mon May 15 04:32:12 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
queue_1  | Mon May 15 04:32:33 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
queue_1  | Mon May 15 04:32:54 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
docker_queue_1 exited with code 0
queue_1  | Mon May 15 04:32:12 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
queue_1  | Mon May 15 04:32:33 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
queue_1  | Mon May 15 04:32:54 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
queue_1  | Mon May 15 04:33:17 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.
queue_1  | Mon May 15 04:33:38 UTC 2017
queue_1  | Sleeping...
queue_1  | exit.

My proposed solution:

version: "2"

services:
  ui:
    image: ubuntu
    command: tail -f /dev/null
    links:
      - queue
  queue:
    image: ubuntu
    command: sh -c "while true; do date; echo Sleeping...; sleep 20; echo loop.; done"
    restart: always

Output:

Attaching to docker_queue_1, docker_ui_1
queue_1  | Mon May 15 04:36:16 UTC 2017
queue_1  | Sleeping...
queue_1  | loop.
queue_1  | Mon May 15 04:36:36 UTC 2017
queue_1  | Sleeping...
queue_1  | loop.
queue_1  | Mon May 15 04:36:56 UTC 2017
queue_1  | Sleeping...
queue_1  | loop.
queue_1  | Mon May 15 04:37:16 UTC 2017
queue_1  | Sleeping...

IMO, it's better to see containers start and work, instead of flapping start/exit containers. Docker has an special treatment to containers that have this behaviour, see docker docs. "An ever increasing delay (double the previous delay, starting at 100 milliseconds) is added before each restart to prevent flooding the server." (but it does not apply in your case because your container is living > 10s.


Edit:

To check a possible error and exit. This could be useful:

#!/bin/bash

PERIOD=10 # seconds

while true; do
  echo Starting process...

  date # the process
  STATUS=$?

  if [ $STATUS -ne 0 ]; then
        echo "Error found. Exiting" >&2
        exit $STATUS
  fi

  echo End process. Sleeping ${PERIOD}s
  sleep $PERIOD
done

Output:

▶ ./script.sh
Starting process...
Mon May 15 11:43:06 ART 2017
End process. Sleeping 10s
Starting process...
Mon May 15 11:43:16 ART 2017
End process. Sleeping 10s
Starting process...
Mon May 15 11:43:26 ART 2017
End process. Sleeping 10s
Starting process...
Mon May 15 11:43:36 ART 2017
End process. Sleeping 10s
Starting process...
Mon May 15 11:43:46 ART 2017
End process. Sleeping 10s
Robert
  • 33,429
  • 8
  • 90
  • 94
  • Thanks for your effort writing this up, Robert! My approach to run the queue for a bit and then exit to restart is informed by what used to be a good idea for long-running PHP services - they used to leak memory and so using Cron or Supervisord to periodically restart was often thought to be a good idea. To be fair, memory management (especially with circular references) is much better in PHP these days, but old habits die hard - it feels nice to start with a new instance anyway. – halfer May 15 '17 at 14:21
  • Your approach with a shell loop is a good one, so I will try that. The only downside is that a series of failures would not be detected by Docker Compose (so it can use the anti-flood strategy you mention in your last paragraph, though I suppose I could write a rudimentary one in Bash). However, I dare say if I need more complex facilities then I should move to Swarm or Kubernetes (and I don't really have time to pick up new systems, so I'll happily stick with Compose for now). – halfer May 15 '17 at 14:24
  • @halfer, regarding the possible memory leaks, think this: the PHP processes will be dying in each loop, so its resources are freed by kernel. The only long-lived process will be the shell script. – Robert May 15 '17 at 14:38
  • Yep, very true. I will give this a go! I shall try to report back but I appear to need a machine rebuild, so it may be a couple of days. I think Docker has stolen all my INodes, and it's a struggle to get anything to build `:-)` – halfer May 15 '17 at 14:39
  • Good luck then! :-) – Robert May 15 '17 at 15:02
2

Using the Docker restart functionality just to restart what should be a service (long running queue task) is not necessary at all as the operating system comes with everything you need.

Before making use of the docker restart functionality you restarted the queue your own. But you wrote you retired that Supervisor/Python because it is too fat (~55M) in your eyes.

So I read it that you want to keep the footprint low.

Let's do that. So my proposal is, start at the bare minimum, the operating system. It should have everything you need for your queue process management as that is what is common for operating systems: invoke and control other processes (to keep the list concise).

This is even true for the Alpine Linux PHP images with Busybox with a reduced set of commands and shells (e.g. no Bash).

So I assume you want to kick the PHP script every 5 minutes and then start it again in my example (without timeout it is possible as well, even with less code).

The docker image ships with the timeout command that does this. You can make use of it in the Dockerfile entrypoint sh script. So PHP is invoked and terminated after 300 seconds (five minutes):

# executed by sh command from BusyBox v1.24.2 (2017-01-18 14:13:46 GMT) multi-call binary
#
# usage: ENTRYPOINT ["sh", "docker-entrypoint.sh"]
set -eu

timed_queue() {
  set +e
  timeout \
    -t 300 \
    php -d max_execution_time=301 \
      -f /queue.php -- \
      --proxy-address "${PROXY_ADDRESS}" \
      --queue-path /remote/queue \
    ;
  retval=$?
  set -e
}

timed_queue
while [ $retval -eq 143 -o $retval -eq 0 ]; do
  timed_queue
done

The setting to the settings is done with the -t 300 command-line parameter to timeout. The exit status is 143 by default (SIGTERM) in case the timeout kicks in.

Your queue script returns 0 in case it ends successfully. In both cases the script is started again (with the timeout).

This requires an additional small change to your queue script. You make use of the die command in the following fashion to signal errors:

die(
    sprintf("Error: the supplied queue path `%s` does not exist\n", $queuePath)
);

Instead output the error message to standard error output and exit with a non-zero status code (e.g. 1):

fprintf(STDERR, "Error: the supplied queue path `%s` does not exist\n", $queuePath);
die(1);

This will make the script compatible with the operating system. On the command-line an exit status of zero (0) signals OK (that is using die("string") with a string message as in your example) but a non-zero code shows error (for starters use 1).

Done this, you will get the functionality of a re-starting and timeout-ing script with practically no overhead. That is both in file-size but as well in processing time. I've put this to a test with a timeout of one second and it shows only a very little overhead of some microseconds when restarted:

queue_instance     | info: 2017-05-19T14:00:24.34824700 queue.php started...
queue_instance     | info: 2017-05-19T14:00:25.35548200 queue.php started...
queue_instance     | info: 2017-05-19T14:00:26.34564400 queue.php started...
queue_instance     | info: 2017-05-19T14:00:27.35868700 queue.php started...
queue_instance     | info: 2017-05-19T14:00:28.34597300 queue.php started...
queue_instance     | info: 2017-05-19T14:00:29.34139800 queue.php started...
queue_instance     | info: 2017-05-19T14:00:30.26049500 queue.php started...
queue_instance     | info: 2017-05-19T14:00:31.26174500 queue.php started...
queue_instance     | info: 2017-05-19T14:00:32.26322800 queue.php started...
queue_instance     | info: 2017-05-19T14:00:33.26352800 queue.php started...
queue_instance     | info: 2017-05-19T14:00:34.26533300 queue.php started...
queue_instance     | info: 2017-05-19T14:00:35.26524300 queue.php started...
queue_instance     | info: 2017-05-19T14:00:36.26743300 queue.php started...
queue_instance     | info: 2017-05-19T14:00:37.26889500 queue.php started...
queue_instance     | info: 2017-05-19T14:00:38.27222300 queue.php started...
queue_instance     | info: 2017-05-19T14:00:39.27209000 queue.php started...
queue_instance     | info: 2017-05-19T14:00:40.27620500 queue.php started...
queue_instance     | info: 2017-05-19T14:00:41.27985300 queue.php started...
queue_instance     | info: 2017-05-19T14:00:42.28136100 queue.php started...
queue_instance     | info: 2017-05-19T14:00:43.28252200 queue.php started...
queue_instance     | info: 2017-05-19T14:00:44.28403600 queue.php started...
queue_instance     | info: 2017-05-19T14:00:45.28595300 queue.php started...
queue_instance     | info: 2017-05-19T14:00:46.28683900 queue.php started...
queue_instance     | info: 2017-05-19T14:00:47.28803800 queue.php started...

The subtle differences can most likely be ignored in your scenario as you want to run the script for five minutes.

In case the process crashes (fatal errors etc.) PHP will have an exit code of 127 so you will get notice as the container shuts down then. Same if there is a mistake in the Dockerfile entrypoint script, like an undefined environment variable.

So if I understood you right, this should be what you're looking for.

Depending on what is finally invoked via CMD or ENTRYPOINT, more process handling might apply. Or more handling might apply further. Some references:

halfer
  • 19,824
  • 17
  • 99
  • 186
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Excellent work hakre, and thank you. You guessed right - the queue should be long-running, but as a longstanding habit I like PHP background processes to restart every so often just to clear down any memory leaks. Five minutes is good but could be anything reasonably short. – halfer May 19 '17 at 15:47
  • You make a good point about exit codes - I'd spotted that, and have made some attempts to fix that already. I'll certainly take a look at `runit` but the shell loops you and Robert suggest should be fine. Small containers here I come! – halfer May 19 '17 at 15:48
  • You don't need to look for runit in your case IMHO, I just wanted to give further references for future visitors as well. Another system-near command that is interesting for docker images is `run-parts` if you like to make things more modular. – hakre May 19 '17 at 20:32
  • Here is another alternative to supervisord, similar to runit; [s6 process supervisor](http://skarnet.org/software/s6/) – hakre May 20 '17 at 20:19