What I did
- Started services on an AlmaLinux server with
docker-compose up
- Noticed output of
docker-compose logs
wasn't changing for a while
- Check
docker-compose ps
$ docker-compose ps
Name Command State Ports
------------------------------------------------------------------------------------
mysupercoolsystem_api_1 python -m mysupercoolsyste ... Exit 137
mysupercoolsystem_dev_1 sh -c jupyter lab --ip=0.0 ... Exit 137
mysupercoolsystem_loader_1 /bin/sh -c python -m mysup ... Exit 137
mysupercoolsystem_predictor_1 /bin/sh -c python -m mysup ... Exit 137
mysupercoolsystem_trainer_1 /bin/sh -c python -m mysup ... Exit 137
$ docker ps -a # just to confirm
72708f3450 hub.nic.dk/nicecompany/mysupercoolsystem "/bin/sh -c 'python …" 2 days ago Exited (137) 2 days ago mysupercoolsystem_trainer_1
3e286cabb0 jupyter/scipy-notebook:33add21fab64 "sh -c 'jupyter lab …" 2 days ago Exited (137) 2 days ago mysupercoolsystem_dev_1
246b87f0ac hub.nic.dk/nicecompany/mysupercoolsystem "/bin/sh -c 'python …" 2 days ago Exited (137) 2 days ago mysupercoolsystem_predictor_1
7d3297092c hub.nic.dk/nicecompany/mysupercoolsystem "python -m mysuperc …" 2 days ago Exited (137) 2 days ago mysupercoolsystem_api_1
2a07851f9c hub.nic.dk/nicecompany/mysupercoolsystem "/bin/sh -c 'python …" 2 days ago Exited (137) 2 days ago mysupercoolsystem_loader_1
- Research whether containers were stopped because of out-of-memory
- Checked virtual host: The docker containers run on a single virtual (vcenter-managed) host. The host is allocated 20GB of RAM and vcenter monitor shows RAM usage peaks at ca. 8GB and not more.
- Follow-up: Talked to sysadmin: Servers were not restarted or explicitly asked to terminate any processes.
docker info | grep Memory
returns Total Memory: 19.37GiB
- checked each container with
docker inspect <container_id>
gives the same "State"
, apart from the field "FinishedAt"
which varies with ±0.05
seconds.
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "",
"StartedAt": "2021-11-13T10:33:04.785566471Z",
"FinishedAt": "2021-11-13T10:33:57.1xxxxZ"
- Re-examined my
docker-compose.yml
.
$ cat docker-compose.yml
version: "3"
services:
dev:
image: jupyter/scipy-notebook:33add21fab64
environment:
- COMPONENT=develop
volumes:
- /opt/mysupercoolsystem:/home/jovyan/work
- /media:/media
ports:
- "3333:3333"
entrypoint: sh -c "jupyter lab --ip=0.0.0.0 --port=3333 --no-browser --allow-root"
loader:
image: hub.nic.com/nicecompany/mysupercoolsystem
working_dir: "/app"
volumes:
- /media:/media
trainer:
image: hub.nic.dk/nicecompany/mysupercoolsystem
environment:
- COMPONENT=train
working_dir: "/app"
volumes:
- models:/models
predictor:
image: hub.nic.dk/nicecompany/mysupercoolsystem
environment:
- COMPONENT=pred
working_dir: "/app"
volumes:
- models:/models
api:
image: hub.nic.dk/nicecompany/mysupercoolsystem
environment:
- COMPONENT=api
working_dir: "/app"
ports:
- "69:69"
entrypoint: python -m mysupercoolsystem.web_api
volumes:
models:
- Examine
Dockerfile
. Note: Services that do not have an explicit entrypoint in docker-compose.yml
inherit the entrypoint from the Dockerfile
.
$ cat mysupercoolsystem/Dockerfile
FROM python:3.8
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
COPY . /app
RUN pip install .
ENTRYPOINT python -m mysupercoolsystem
- Checked similair issue (this issue had
--abort-on-container-exit
-flag as the culprit. I am not using any flags).
How to proceed
- Why are the services exiting?
- What can I do to troubleshoot the error?
- Are there other logs I should be checking?
- If I add
restart: unless-stopped
on each service, is there any way to examine docker service exits apart from my own logging via docker logs
?