11

I recently set up healthchecks in my docker-compose config.

It is doing great and I like it. Here's a typical example:

services:
  app:
    healthcheck:
      test: curl -sS http://127.0.0.1:4000 || exit 1
      interval: 5s
      timeout: 3s
      retries: 3
      start_period: 30s

My container is quite slow to boot, hence I set up a 30 seconds start_period.

But it doesn't really fit my expectation: I don't need check every 5 seconds, but I need to know when the container is ready for the first time as soon as possible for my orchestration, and since my start_period is approximative, if it is not ready yet at first check, I have to wait for interval before retry.

What I'd like to have is:

  • While container is not healthy, retry every 5 seconds
  • Once it is healthy, check every 1 minute

Ain't there a way to achieve this out-of-the-box with docker-compose?

I could write a custom script to achieve this, but I'd rather have a native solution if it is possible.

Augustin Riedinger
  • 20,909
  • 29
  • 133
  • 206
  • there is no such service out-of-the-box, you need to achieve that using the test script, or wait-for-it.sh – LinPy Feb 14 '20 at 06:09
  • There is a new option `start_interval` in Docker 25 (not released yet). See https://github.com/docker/compose/issues/10830. – ZhekaKozlov Aug 14 '23 at 14:49

2 Answers2

6

Unfortunately, this is not possible out of the box.
All the duration set are final. They can't be changed depending on the container state.

However, according to the documentation, the probe does not seem to wait for the start_period to finish before checking your test. The only thing it does is that any failure hapenning during start_period will not be considered as an error.

Below is the sentence that make me think that :

start_period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.

I encourage you to test if this is really the case as I've never really paid any attention if the healthcheck is tested during the start period or not.
And if it is the case, you can probably increase your start_period if you're unsure about the duration and also increase the interval in order to find a good compromise.

Marc ABOUCHACRA
  • 3,155
  • 12
  • 19
3

I wrote a script that does this, though I'd rather find a native solution:

#!/bin/sh

HEALTHCHECK_FILE="/root/.healthchecked"

COMMAND=${*?"Usage: healthcheck_retry <COMMAND>"}

if [ -r "$HEALTHCHECK_FILE" ]; then
  LAST_HEALTHCHECK=$(date -r "$HEALTHCHECK_FILE" +%s)
  # FIVE_MINUTES_AGO=$(date -d 'now - 5 minutes' +%s)
  FIVE_MINUTES_AGO=$(echo "$(( $(date +%s)-5*60 ))")
  echo "Healthcheck file present";
  # if (( $LAST_HEALTHCHECK > $FIVE_MINUTES_AGO )); then
  if [ $LAST_HEALTHCHECK -gt $FIVE_MINUTES_AGO ]; then
    echo "Healthcheck too recent";
    exit 0;
  fi
fi

if $COMMAND ; then
  echo "\"$COMMAND\" succeed: updating file";
  touch $HEALTHCHECK_FILE;
  exit 0;
else
  echo "\"$COMMAND\" failed: exiting";
  exit 1;
fi

Which I use: test: /healthcheck_retry.sh curl -fsS localhost:4000/healthcheck

The pain is that I need to make sure the script is available in every container, so I have to create an extra volume for this:

    image: postgres:11.6-alpine
    volumes:
      - ./scripts/utils/healthcheck_retry.sh:/healthcheck_retry.sh
Augustin Riedinger
  • 20,909
  • 29
  • 133
  • 206