Docker containers randomly shutting down

Question

So I've had this system where I had 3 docker containers running at the same time to host my APIs, one for traefik, and the other two for different APIs (written in python using FastAPI). It was working fine for a while, however now all the docker containers seem to be randomly shutting down after a couple hours at the same.

This is the error message all of them output:

Exception in thread Thread-5:
Traceback (most recent call last):
  File "urllib3/connectionpool.py", line 677, in urlopen
  File "urllib3/connectionpool.py", line 392, in _make_request
  File "http/client.py", line 1277, in request
  File "http/client.py", line 1323, in _send_request
  File "http/client.py", line 1272, in endheaders
  File "http/client.py", line 1032, in _send_output
  File "http/client.py", line 972, in send
  File "docker/transport/unixconn.py", line 43, in connect
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "requests/adapters.py", line 449, in send
  File "urllib3/connectionpool.py", line 727, in urlopen
  File "urllib3/util/retry.py", line 410, in increment
  File "urllib3/packages/six.py", line 734, in reraise
  File "urllib3/connectionpool.py", line 677, in urlopen
  File "urllib3/connectionpool.py", line 392, in _make_request
  File "http/client.py", line 1277, in request
  File "http/client.py", line 1323, in _send_request
  File "http/client.py", line 1272, in endheaders
  File "http/client.py", line 1032, in _send_output
  File "http/client.py", line 972, in send
  File "docker/transport/unixconn.py", line 43, in connect
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "threading.py", line 926, in _bootstrap_inner
  File "threading.py", line 870, in run
  File "compose/cli/log_printer.py", line 168, in tail_container_logs
  File "compose/cli/log_printer.py", line 185, in wait_on_exit
  File "compose/container.py", line 268, in wait
  File "docker/utils/decorators.py", line 19, in wrapped
  File "docker/api/container.py", line 1305, in wait
  File "docker/utils/decorators.py", line 46, in inner
  File "docker/api/client.py", line 233, in _post
  File "requests/sessions.py", line 578, in post
  File "requests/sessions.py", line 530, in request
  File "requests/sessions.py", line 643, in send
  File "requests/adapters.py", line 498, in send
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

Here is my docker-compose for my first API (call it "Hierarchy"):

services:

  backend:
    build: ./
    restart: always
    labels:
      - traefik.enable=true

      - traefik.http.services.hierarchy_app.loadbalancer.server.port=80

      - traefik.http.routers.hierarchy-http.entrypoints=http
      - traefik.http.routers.hierarchy-http.rule=Host(`api.mydomain.me`)
      - traefik.docker.network=traefik-public

      - traefik.http.routers.hierarchy-https.entrypoints=https
      - traefik.http.routers.hierarchy-https.rule=Host(`api.mydomain.me`)
      - traefik.http.routers.hierarchy-https.tls=true

      - traefik.http.routers.hierarchy-https.tls.certresolver=le

      - traefik.http.middlewares.https-redirect.redirectscheme.scheme=https
      - traefik.http.middlewares.https-redirect.redirectscheme.permanent=true

      - traefik.http.routers.hierarchy-http.middlewares=https-redirect

    networks:
      - traefik-public

    volumes:
      - ${PWD}/hierarchy_app/commands.json:/hierarchy_app/commands.json:Z

networks:
  traefik-public:
    external: true

Here is my docker-compose for my second API (call it "Voicerooms"):

services:

  backend:
    build: ./
    restart: always
    labels:
      - traefik.enable=true

      - traefik.http.services.voicerooms_app.loadbalancer.server.port=80

      - traefik.http.routers.voicerooms-http.entrypoints=http
      - traefik.http.routers.voicerooms-http.rule=Host(`api.mydomain2.app`)
      - traefik.docker.network=traefik-public

      - traefik.http.routers.voicerooms-https.entrypoints=https
      - traefik.http.routers.voicerooms-https.rule=Host(`api.mydomain2.app`)
      - traefik.http.routers.voicerooms-https.tls=true

      - traefik.http.routers.voicerooms-https.tls.certresolver=le

      - traefik.http.middlewares.https-redirect.redirectscheme.scheme=https
      - traefik.http.middlewares.https-redirect.redirectscheme.permanent=true

      - traefik.http.routers.voicerooms-https.middlewares=https-redirect
    networks:
      - traefik-public

    volumes:
      - ${PWD}/voicerooms_app/commands.json:/voicerooms_app/commands.json:Z

networks:
  traefik-public:
    external: true

Any help would be greatly appreciated, I've been trying to figure out this problem for a while :)

Edit: After scrolling up a bit more, I found this in the error message:

backend_1  | [2022-01-10 02:04:51 +0000] [1] [INFO] Handling signal: term
backend_1  | [2022-01-10 02:04:51 +0000] [7] [INFO] Shutting down
backend_1  | [2022-01-10 02:04:51 +0000] [8] [INFO] Shutting down

I'm assuming my docker container is getting terminated for some reason, what's going on here?

Please refrain from posting links to external websites. Could you reframe your question without those links? You are free to embed any files here — Rafael de Bem, Jan 08 '22 at 19:37
Does this problem happen with other docker containers such as docker's hello-world? — Rafael de Bem, Jan 08 '22 at 23:06
#1 In which api, is the error "docker/transport/unixconn.py FileNotFoundError" ? #2 Does this sound familiar in your code : unixconn line 43, urllib3? That is the error. #3 Try without traefik #4 Try to monitor the cpu & ram of containers to detect if that is the problem — JRichardsz, Jan 16 '22 at 15:43
@JRichardsz 1. All 3 containers are raising the same error 2. No, the error is not familiar at all. It looks like something internal within the library is causing issues 3. The APIs work perfectly fine when not used in a docker container 4. I have already tried monitoring the CPU and ram usage, and it seems to remain steady.. no memory leaks. I suspect something on my server is sending a terminate signal to my containers, which causes all the strange errors to appear. Although I'm not sure what is causing the containers to be forcefully stopped. — okay, Jan 16 '22 at 19:41
Are the apis related? I mean, one of them invokes the other. I a connection error. Also why the error stack has "docker/../../" ? Docker is in another layer. Are you trying to perform some docker operation from inside the python apis? — JRichardsz, Jan 17 '22 at 00:26
@JRichardsz Well in a way they are related, since traefik acts as a reverse proxy for the other two containers. However, other than traefik, the APIs are not related in any way. I honestly have no idea why the error stack has "docker/../../", I'm not performing any docker operation within the python API. — okay, Jan 17 '22 at 03:22
In the error stacktrace of your apis "docker" word appears . Are you the developer or you just need to start them? Maybe the traefik is the problem. Could you start the apis without traefik and test it one by one (rest endpoints)? Also try with another pythons rest apis (some hello world) to verify the random shutdown — JRichardsz, Jan 17 '22 at 14:15
if your host is out of memory it will start killing processes. Docker engine is built in a way where it tries to kill containers first. — The Fool, Jan 17 '22 at 19:10
but from the errors it looks like you have a broken compose installation. Try running the containers without compose and see if it changes anything. — The Fool, Jan 17 '22 at 19:13
I'm actually unable to test the rest endpoints at all without traefik running. Anyways, to debug this issue further, I kept the traefik container off and left my hierarchy API running using docker-compose. I've also left my voice rooms API running using `docker run`. I'll take a look at both APIs in a couple hours to see if they shut down. — okay, Jan 17 '22 at 19:30
you could activate swarm mode, and use the same yaml file more or less. — The Fool, Jan 17 '22 at 19:32
This is strange. My "hierarchy" API received the same error traceback. However, my "voice rooms" API received something different when started using docker run: [2022-01-17 22:09:55 +0000] [8] [INFO] Shutting down [2022-01-17 22:09:55 +0000] [7] [INFO] Shutting down [2022-01-17 22:09:56 +0000] [8] [INFO] Finished server process [8] [2022-01-17 22:09:56 +0000] [7] [INFO] Finished server process [7] [2022-01-17 22:09:56 +0000] [8] [INFO] Worker exiting (pid: 8) [2022-01-17 22:09:56 +0000] [7] [INFO] Worker exiting (pid: 7) ERRO[9784] error waiting for container: unexpected EOF — okay, Jan 18 '22 at 00:40
then maybe not your compose installation is broken but docker itself, and compose is suffering from that. But it would be also interesting to know how your code actually looks. — The Fool, Jan 18 '22 at 07:52
Well the apis are a bit long, should I sent them using a paste service instead? — okay, Jan 18 '22 at 15:40
Just to clarify, I've thoroughly tested both APIs on my machine, without docker, and they work perfectly fine. — okay, Jan 20 '22 at 21:34
did you try to reinstall docker? Looks to me like its broken, like I alrady said. Or try it on another machine. — The Fool, Jan 20 '22 at 21:39
Nope, reinstalling hasn’t seemed to fix the issue either… — okay, Jan 22 '22 at 09:11
Are you sure a specific library is not missing during the installation of the docker container? I couldn't say which, but it would raise the same kind of error. — vinalti, Jan 22 '22 at 16:54
Does this help you ? https://github.com/prisma/prisma1/issues/5120 — vinalti, Jan 22 '22 at 16:57
Possible duplicate of https://stackoverflow.com/questions/64206533/docker-compose-exceptions — The Fool, Jan 22 '22 at 18:25
Sounds like you're either (1) leaking file descriptors (e.g. not closing files, not closing sockets, etc) (2) opening too many connections/files at once, or (3) the host is stopping the containers and the errors are an artifact of docker shutting the container down. Try periodically logging the number of open file descriptors using `test.support.os_helper.fd_count()` — sytech, Jan 22 '22 at 21:51
@vinalti I've tried everything in that github issue, none of it has worked. — okay, Jan 22 '22 at 22:02
@sytech I'm almost certain the first 2 points you made are not true, because I've tested running the API without docker for multiple days and it worked fine. As for the 3rd point, how exactly would I use `test.support.os_helper.fd_count()`? Where would I insert this line of code? — okay, Jan 22 '22 at 22:06

score 0 · Answer 1 · answered Jan 20 '22 at 10:39

0

I also encountered the same error as you,My problem is that docker does not start.

check docker status

 systemctl status docker

systemctl start docker

docker-compose up try again

systemctl enable docker

to start on boot.

OR

If you're on a Mac, it may mean that Docker itself isn't running. I had rebooted my Mac and Docker wasn't set to automatically launch at login. You can set this as a Docker preference.

OR you can solve this by running sudo systemctl start docker and then running sudo docker-compose up

answered Jan 20 '22 at 10:39

Rakesh B E

786
4
5

In this case docker clearly does start, as the containers run for some time – 2e0byo Jan 20 '22 at 10:42
2e0byo is correct. I've seen this "fix" before, and it hasn't worked. – okay Jan 20 '22 at 18:33

score 0 · Answer 2 · answered Jan 20 '22 at 14:52

Your application dies for some reason and the containers finish. How to debug this is a common question, I would start all containers in an interactive mode and check out what is happening:

docker run --rm -p8080:80 -it <CONTAINER NAME>

Instead of -p8080:80 (exposing container's port 80 on host's 8080), add your networking setup... Also add any mount/binds/environment setting your containers need.

You can also start the container in bash and start your command manually:

docker run --rm -p8080:80 -it <CONTAINER NAME> /bin/bash

From now on, you have to find what is wrong with your app.

score 0 · Answer 3 · answered Jan 22 '22 at 18:12

0

Seems your application is for some reason attempting to use a file that is not there, and it's not managing properly the Exception, making your application exit and your container stop.

Make sure that the send method you have in your application requests/adapters.py on line 498 is properly encapsulated into a Try...Except statement. Also, make sure to log the exception, this will avoid your application to crash, but will not fix the bug or underlying cause - you will have to investigate this separately.

answered Jan 22 '22 at 18:12

Emerson Gomes

79
4

You realize that `requests` is a library? https://docs.python-requests.org/en/latest/. What's also highly suspicious is the mention of this file at the bottom of the stack trace `docker/transport/unixconn.py`, as well as other docker related files. I know compose is written in python, so I assume its compose having issues there, but from comments it seems compose has only issues because docker itself has issues. – The Fool Jan 22 '22 at 18:16
I see. I guess I read your question too quickly. Can you check if is there some output by running dmesg -T on the console? there might be some "cgroups" or "OOM" messages – Emerson Gomes Jan 22 '22 at 18:23
Its not my question. – The Fool Jan 22 '22 at 18:24
" [INFO] Handling signal: term" indicates someone requested the applicative process to die. This often happens when the system gets low on memory and then the kernel kills some application to free up memory using the OOM killer. If that's the case, you will see some of these messages in the output of " dmesg | egrep -i 'killed process' " – Emerson Gomes Jan 22 '22 at 18:27
@EmersonGomes I tried running the command you mentioned, and there's no output. – okay Jan 22 '22 at 20:53

Docker containers randomly shutting down

3 Answers3