I have a Kubernetes-cluster with 1 master-node and 3 worker-nodes. All Nodes are running on CentOS 7 with Docker 19.06. I'm also running Longhorn for dynamic provisioning of volumes (if that's important).
My problem is that every few days one of the worker nodes grows the HDD-usage to 85% (43GB). This is not a linear increase, but happens over a few hours, sometimes quite rapidly. I can "solve" this problem for a few days by first restarting the docker service and then doing a docker system prune -a
. If I don't restart the service first, the prune removes next to nothing (only a few MB).
I also tried to find out which container is taking up all that space, but docker system df
says it doesn't use the space. I used df
and du
to crawl along the /var/lib/docker subdirectories too, and it seems none of the folders (alone or all together) takes up much space either. Continuing this all over the system, I can't find any other big directories either. There are 24GB that I just can't account for. What makes me think this is a docker problem nonetheless is that a restart and prune just solves it every time.
Googling around I found a lot of similar issues where most people just decided to increase disk space. I'm not keen on accepting this as the preferred solution, as it feels like kicking the can down the road.
Would you have any smart ideas on what to do instead of increasing disk space?