Background:
I picked up an old server and started a Linux project a few months ago. I've been learning a ton and having a lot of fun. I'm using the server to host a number of sites (that are still in development because I'm also learning web-site development). Searching online I learned about SSL/TLS certificates, nginx, docker, and so on.
Using what I learned, I set up SWAG to handle certificates and the reverse-proxy using docker-compose. I added my other sites to the same docker-compose.yaml file. After some struggles I was able to get everything working the way I wanted. I had docker containers for each of my sites and stub sites for each of them. I could access them from any browser.
The first time I had a power outage and the server came back up, the docker containers did not all start up correctly. I was getting an "Address already in use" error on the SWAG container. I googled for solutions and found this answer suggesting to find and kill the existing processes that were bound to the ports in question (80 and 443 for me) so that the docker container (the SWAG container in my case) could be bound to them. The processes that I found that were using ports 80 and 443 were all "docker-pr". I killed them and I was able to restart the SWAG docker container successfully without any errors. However, every time I restart or reboot the server, I have to go through the same steps.
Recently (the last several weeks) I've also been seeing another problem. I follow the above steps and everything is working as expected. Then anywhere from 1-2 hours or 1-2 days later, I check in to check the status of my docker containers and it says this:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
I checked, and there is no /var/run/docker.sock file. I checked the status of the docker daemon using the sudo service docker status
command, which showed this:
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─override.conf
Active: active (running) since Fri 2021-12-03 14:10:19 UTC; 3 days ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 1216 (dockerd)
Tasks: 44
Memory: 170.7M
CGroup: /system.slice/docker.service
└─1216 /usr/bin/dockerd -H fd:// -H tcp://127.0.0.1:2375
I rebooted the server, followed the above steps, and checked again. The /var/run/docker.sock file is there, and the status
command shows this:
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─override.conf
Active: active (running) since Fri 2021-12-03 14:10:19 UTC; 10min ago
TriggeredBy: ● docker.socket
Docs: https://docs.docker.com
Main PID: 1216 (dockerd)
Tasks: 73
Memory: 146.5M
CGroup: /system.slice/docker.service
├─1216 /usr/bin/dockerd -H fd:// -H tcp://127.0.0.1:2375
├─2178 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 443 -container-ip 172.18.0.11 -container-port 443
├─2185 /usr/bin/docker-proxy -proto tcp -host-ip :: -host-port 443 -container-ip 172.18.0.11 -container-port 443
├─2202 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 80 -container-ip 172.18.0.11 -container-port 80
└─2209 /usr/bin/docker-proxy -proto tcp -host-ip :: -host-port 80 -container-ip 172.18.0.11 -container-port 80
After an hour or a day, however, it goes back to how it was and I have to reboot the server again. So far, the only way that I have found to fix this is by restarting or rebooting the computer, and repeating the above steps. Then after an hour or a day I have to do it again.
The only difference between the before and after status seems to be the existence of what look like docker-proxy
processes, so I tried looking into that. I'm wondering if the docker-pr
processes I killed is related. Going back further, I'm wondering if the way that I have set up docker with SWAG handling the reverse proxy, I'm missing a configuration step that would solve both this problem and the original (Address already in use) problem mentioned above.
I'm sure I'm missing something. I'm sure other people get this to work just fine. What am I missing?
Please let me know if you need additional information or clarity on anything.