0

I've run into this issue where I am deploying my app with a CI/CD pipeline on a docker swarm cluster.

I keep getting not enough space on device when deploying, which is weird... My images are all <500mb in size, and there isn't much data on the server to begin with.

I start investigating.

sudo du -a -h / | sort -n -r | head -n 5

5G   /var/lib/docker/overlay2/ec1a3324f4cb66327ff13907af28b101ab15d1a0a27a04f0adedf50017f1612e/merged/etc
6G   /var/lib/docker/overlay2/98f9e5f2c28a7ee7972cadfeaa069210238c06b5f806c2f5e039da9d57778817/merged/etc
2G   /var/lib/docker/overlay2/7fe5364228810e035090c86448b5327150f7372c9d2216b8ab4f8c626e679ba0/merged/etc
1G   /var/lib/docker/overlay2/5f80f0b1a72b83553c9089a54226c260b2e695dbba69b9e06ecc18fc18e3d107/merged/etc

And I see that the docker overlay2 folders are taking huges amount of space.

So I clean them up using docker system prune -a -f --volumes.

But I am wondering why this happens?

I am suspecting that in between deploying a new instance of my services, the volumes are attached to the new container and the old container keeps writing to it's filesystem.

What really happens regarding volumes when you deploy a new docker image on a docker swarm cluster? Does it disconnect the volume mapping on the old node - reconnect to the new one, leaving the old instance to write to it's own filesystem?

What steps should I put in place to avoid this?

Example deploy-stack.yml

version: "3.9"
services:
  myApp:
    image: myRepo/myApp:latest
    depends_on:
      - db
    volumes:
      - /var/data/uploads:/app/uploads
      - /var/data/logs:/app/logs
    deploy:
      replicas: 1
      update_config:
        parallelism: 1
        order: start-first
        failure_action: rollback
        monitor: 30s
      restart_policy:
        condition: any
    ports:
      - "80:80"
 
  db:
    image: "postgres:15beta3-alpine"
    container_name: db_pg
    environment:
      POSTGRES_PASSWORD: XXXXXXXXXXXX
      PGDATA: /var/lib/postgresql/data
    volumes:
      - /var/data/db_pg:/var/lib/postgresql/data
    deploy:
      replicas: 1
      update_config:
        parallelism: 1
        failure_action: rollback
        monitor: 30s
      restart_policy:
        condition: any
  seq:
    image: datalust/seq:latest
    environment:
      ACCEPT_EULA: "Y"
      SEQ_FIRSTRUN_ADMINPASSWORDHASH: XXXXXXXXXXXXXXX
    ports:
      - 8888:80
    volumes:
      - /var/data/seq:/data
    deploy:
      replicas: 1
      update_config:
        parallelism: 1
        failure_action: rollback
        monitor: 30s
      restart_policy:
        condition: any
networks:
  default:
    external: true
    name: app-network

Is the myApp.deploy.update_config.order: start-first causing this?

user3620691
  • 69
  • 1
  • 7
  • Related since I don't believe this question is really about swarm or volumes but rather a lack of disk space from overlay2 filling: https://stackoverflow.com/q/46672001/596285 – BMitch May 23 '23 at 13:32

2 Answers2

0

When you deploy a new docker image on a docker swarm cluster, it creates a new container from the new image and removes the old container. The volumes are mapped to the container and are not part of the image, so they persist between deployments.

Yeah, it's possible that the old container is still writing to the volume while the new container is running. To avoid this, you can use a shared storage solution, such as EFS or S3. This way, the data is stored separately from the containers and can be accessed by any container in the swarm.

You may also want to consider using a tool like Docker Swarm Visualizer, which can help you visualize the swarm and the containers running on each node. This can help you identify any potential issues with volume mapping or container connectivity.

Regarding the Docker Compose file, yeah it's possible that the start-first order specified in the update_config of the myApp service could be causing the issue.

If the volumes specified for the myApp service are also used by other services, and the myApp service starts writing to the volumes before the other services have fully started, it could cause conflicts and issues with the volumes.

To avoid this, you could try removing the start-first order, and allow the services to start in the default order defined by the swarm. Additionally, you could try using separate volumes for each service, to avoid any potential conflicts between them.

Another thing is that you are using a bind mount, so the host file system may still be writing to the files while the container is stopped, causing data corruption when the volume is reattached to the new container.

To avoid these issues, it's important to ensure that your application stops writing to any volumes or bind mounts before the container is stopped during a deployment.

donnie_d
  • 16
  • 1
0

Well I'll post the answer to my question here.

The issue I was having was not with docker volumes persisting after a rollout, but docker log files being huge inside my containers.

The answer posted by @donnie_d is right because I have had some big containers in the past, and you must be sure that your app doesn't have some weird tear down where it writes files all over the place. SO make sure your app isn't doing something weird first.

Then I resolved my issue by fixing the docker daemon log file sizes

"log-opts": {
    "max-size": "10m",
    "max-file": "3" 
  }

PS: This can happened to me because of my use of Serilog that was a bit naive.

user3620691
  • 69
  • 1
  • 7