How do I fix Kubernetes NodeUnderDiskPressure errors?

Question

After creating a simple hello world deployment, my pod status shows as "PENDING". When I run kubectl describe pod on the pod, I get the following:

Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  14s (x6 over 29s)  default-scheduler  0/1 nodes are available: 1 NodeUnderDiskPressure.

If I check on my node health, I get:

Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Fri, 27 Jul 2018 15:17:27 -0700   Fri, 27 Jul 2018 14:13:33 -0700   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Fri, 27 Jul 2018 15:17:27 -0700   Fri, 27 Jul 2018 14:13:33 -0700   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     True    Fri, 27 Jul 2018 15:17:27 -0700   Fri, 27 Jul 2018 14:13:43 -0700   KubeletHasDiskPressure       kubelet has disk pressure
  Ready            True    Fri, 27 Jul 2018 15:17:27 -0700   Fri, 27 Jul 2018 14:13:43 -0700   KubeletReady                 kubelet is posting ready status. AppArmor enabled

So it seems the issue is that "kubelet has disk pressure" but I can't really figure out what that means. I can't SSH into minikube and check on its disk space because I'm using VMWare Workstation with --vm-driver=none.

https://kubernetes.io/docs/concepts/architecture/nodes/ describes the statuses. I don't know that you can resolve it without getting an admin shell on the node somehow, unless you're content to destroy and recreate the node. In short this just sounds like "you're trying to fit too much on the one VM". — David Maze, Jul 27 '18 at 22:28
Sorry, what does it mean to get "an admin shell on the node"? — Imran, Jul 27 '18 at 23:35
The Kubernetes Node object represents some piece of computer hardware (or a VM). So you need a root shell on the VM so you can run administrative commands like `df` and `docker images`. If you can't ssh into it, maybe you can directly access its console. — David Maze, Jul 28 '18 at 00:04
Read minikube docs you can bash in to the node. You didn't give it enough disk space, open up your VM workstation app and see what it has for a disk — Lev Kuznetsov, Jul 28 '18 at 13:36

score 14 · Answer 1 · edited Dec 09 '22 at 10:09

14

This is an old question but I just saw it and because it doesn't have an answer yet, I will write my answer.

I was facing this problem and my pods were getting evicted many times because of disk pressure and different commands such as df or du were not helpful.

With the help of the answer that I wrote here, I found out that the main problem is the log files of the pods and because K8s is not supporting log rotation they can grow to hundreds of Gigs.

There are different log rotation methods available but I currently I am searching for the best practice for K8s so I can't suggest any specific one, yet.

I hope this can be helpful.

edited Dec 09 '22 at 10:09

Hamza Saeed

164
1
15

answered Dec 06 '19 at 09:09

AVarf

4,481
9
47
74

1

So your problem was the disk being filled by log files. Good hint. https://stackoverflow.com/q/50718608/4124767 has some notes about log files rotation. One possibility mentioned is to configure docker like in https://docs.docker.com/config/containers/logging/json-file/ (I am still looking for the condition "DiskPressure", my disk is only 85% full.) – simohe Apr 15 '20 at 15:17
1

I didn't look at the links because soon after that comment I just set the log rotation on the docker on all our K8s nodes and after months everything is good. If I recall correctly 85% is very close to the default 90% that triggers kubelet to purge everything. – AVarf Apr 15 '20 at 15:48

score 1 · Answer 2 · answered Aug 21 '20 at 15:52

1

Personally I couldn't solve the problem using kube commands because ...
It was said to be due to an antivirus (McAfee). Reinstalling the company-endorsed docker-desktop version solved the problem.

answered Aug 21 '20 at 15:52

user1767316

3,276
3
37
46

score 0 · Answer 3 · answered Jun 22 '23 at 07:27

as simohe commented, here are the treshholds: https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

Here are the Linux commands to check this on your Node:

memory.available<100Mi free -hm | awk 'NR==2{print $7}' #output has to be higher than 100Mi
nodefs.available<10% df -h / | awk 'NR==2{print $5}' #output has to be lower than 90%
imagefs.available<15%
- containerd: df -h /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs | awk 'NR==2{print $5}' #output has to be lower than 85%
- docker: df -h /var/lib/docker | awk 'NR==2{print $5}' #output has to be lower than 85%
nodefs.inodesFree<5% (Linux nodes) df -i / | awk 'NR==2{print $5}' #output has to be lower than 95%

To get a overview to where all your storage went use: du / -d 1 -h 2> /dev/null | sort -hr

This is a good debugging step, but it's not a full solution. — ryanwebjackson, Jun 27 '23 at 12:22

score -1 · Answer 4 · answered Jun 21 '22 at 14:32

Had the similar issue.

My_error_log : Warning FailedScheduling 3m23s default-scheduler 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node-role.kubernetes.io/controlplane: true}, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate

For me, the / partition was filled to 82%. Cleaning up some unwanted folders resolved the issue. Command used:-

ssh uname@IP_or_hostname (login to the worker node )
df -h (to check the disk usage)
rm -rf folder_name (delete the unwanted folder,you are forcefully deleting the file, so make sure you really want to delete it).

I hope this can save someone's time.

score -3 · Answer 5 · answered Jul 30 '18 at 14:50

Community hinted you the comments above. Will try to consolidate it.

The kubelet maps one or more eviction signals to a corresponding node condition.

If a hard eviction threshold has been met, or a soft eviction threshold has been met independent of its associated grace period, the kubelet reports a condition that reflects the node is under pressure.

DiskPressure

Available disk space and inodes on either the node’s root filesystem or image filesystem has satisfied an eviction threshold

So the problem might be not enough disk space or filesystem has run out of inodes. You have to learn about the conditions of your environment and then apply them in your kubelet configuration.

You do not need to ssh into the minikube since you are running it inside of your host: --vm-driver=none -

option that runs the Kubernetes components on the host and not in a VM. Docker is required to use this driver but no hypervisor. If you use --vm-driver=none, be sure to specify a bridge network for docker. Otherwise it might change between network restarts, causing loss of connectivity to your cluster.

You might try to check if there are some issues related to the mentioned topics:

kubectl describe nodes

Look at df reports:

df -i
df -h

Some further reading so you can grasp the topic: Configure Out Of Resource Handling - section Node Conditions.

This post is not helpful despite being the top 1 result on Google. It doesn't describe anything actionable to do for docker-for-desktop — Henrik, Jul 31 '19 at 19:15
And how can I find the current limits? `df -i` shows there are free inodes, `df -h` shows there is free disc space. But maybe it is not enoug? And in the output of `kubectl describe nodes` I do not recognize the limits. — simohe, Apr 15 '20 at 15:41
I found the default limits on https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#hard-eviction-thresholds: `memory.available<100Mi`, `nodefs.available<10%`, `nodefs.inodesFree<5%,` `imagefs.available<15%` — simohe, Jun 12 '20 at 21:19
Where are these configured? There is a default setting, that is set nowhere seems like. Not in the config file, not in configmaps. — Markus Bawidamann, Mar 18 '21 at 21:49

How do I fix Kubernetes NodeUnderDiskPressure errors?

5 Answers5