The node was low on resource: ephemeral-storage

Question

All the pods of a node are on Evicted state due to "The node was low on resource: ephemeral-storage."

portal-59978bff4d-2qkgf                            0/1     Evicted   0          14m
release-mgmt-74995bc7dd-nzlgq                      0/1     Evicted   0          8m20s
service-orchestration-79f8dc7dc-kx6g4              0/1     Evicted   0          7m31s
test-mgmt-7f977567d6-zl7cc                         0/1     Evicted   0          8m17s

anyone knows the quick fix of it.

How did you setup the cluster and where it is public cloud or on Prem? — Arghya Sadhu, Jan 25 '20 at 06:47

Arghya Sadhu · Answer 1 · 2020-01-25T09:37:49.887

33

Pods that use emptyDir volumes without storage quotas will fill up this storage, where the following error is present:

eviction manager: attempting to reclaim ephemeral-storage

Set a quota limits.ephemeral-storage, requests.ephemeral-storage to limit this, as otherwise any container can write any amount of storage to its node filesystem.

A sample resource quota definition

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
spec:
  hard:
    pods: "4" 
    requests.cpu: "1" 
    requests.memory: 1Gi 
    requests.ephemeral-storage: 2Gi 
    limits.cpu: "2" 
    limits.memory: 2Gi 
    limits.ephemeral-storage: 4Gi

Another reason for this issue can be log files eating disk space. Check this question

edited Jan 25 '20 at 09:37

answered Jan 25 '20 at 07:00

Arghya Sadhu

41,002
9
78
107

thanks for your quick response I am not getting this, is this going to restrict persistent volume claim too? – Dasarathi Swain Jan 25 '20 at 08:05
2

No..this is for volumes which uses emotyDir: { } – Arghya Sadhu Jan 25 '20 at 08:24
Please find the below pod details, implicitly there is no volume declared { containers: - image: ${nexusrepo_pull_162}/service_orchestration:develop imagePullPolicy: Always name: service-orchestration resources: limits: cpu: 400m memory: 2Gi ephemeral-storage: 2Gi requests: cpu: 200m memory: 400Mi ephemeral-storage: 500Mi } – Dasarathi Swain Jan 25 '20 at 09:21
Can you check if you are generating huge logs in the pod? – Arghya Sadhu Jan 25 '20 at 09:26

Andre Miras · Answer 2 · 2022-12-11T12:40:03.067

In my case the problem was the nodes were filling up with docker images. Some of them unused and never pruned and others way too big. To confirm it, you first have to ssh to the node and check if the disk is (nearly) full. For instance:

[root@node-name ~]# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1   20G   15G  5.9G  71% /

It's possible to findout which image specifically occupies to most space and I recommend to do so. Check this excellent resource to see how to: https://rharshad.com/eks-troubleshooting-disk-pressure/

Knowing which image takes the most space and investigating its file system to know why can be useful to optimize image size, but that's a different topic.

If you can't add more storage to the node it's possible to clean it up with docker prune. But before we need to make sure no containers are running, so let’s drain the node first:

kubectl drain node-name

Note that the node will be cordoned after it’s drained, this means no containers will be scheduled to it. Back inside the node let’s prune the unused docker resources:

[root@node-name ~]# docker system prune --all
WARNING! This will remove:
  - all stopped containers
  - all networks not used by at least one container
  - all images without at least one container associated to them
  - all build cache

Are you sure you want to continue? [y/N] y
Deleted Containers:
8333683571a2ceff47bf08cc254f8fa3809acacc7fb981be3c1c274e9465dd68
28bdc62425707127ac977d20fd3dc85374ffc54ccccf2b2f2098d9af9ca3c898
7315014bfd9207c5a1b8e76ef0f1567bb5e221de6fe0304f4728218abd7e1f3f
b0f5ecb854a9f4b41610d7ec5b556447600f57529e68ae2093d1d40df02ff214
9e24227321d5e151bc665c55bcd474c9d586857cbac3cad744aad2dc11729e5e
63ab1bf7ded78d4b77db22f9c1aaac6a55247c71ca55b51caa8492f2b16c4d69
...
Total reclaimed space: 4.529GB

Then check the storage space again:

[root@node-name ~]# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1   20G  8.9G   12G  45% /

Now let’s put the node back to a ready state using the kubectl command from the host:

rancher kubectl uncordon node-name

score 1 · Answer 3 · edited Jun 14 '23 at 20:07

This issue happened due to of lacking of temporary storage while processing such as application process their jobs and store temporary, cache data.

To resolve this issue, you must dive into your pod, and check, when the process running which device location cost your available storage by command df -h, and observe the available capacity size. You can create a pvc (with hostpath, or other ways) which has larger size and mount into pod's directory which store their temporary data.

matyas · Answer 4 · 2022-09-07T12:58:55.930

1

My problem was that my pod was writing to a folder that was not defined in the volumeMounts of the deployments.

volumeMounts:
  - name: my-data-volume
    mountPath: "/the/path/thatImounted"

my pod wrote to a different path than "/the/path/thatImounted"

The solution in this case is to to either add the path that the pod writes to to as addittional mountPath or to fix the the wrong mountPath

edited Sep 07 '22 at 12:58

answered Sep 07 '22 at 09:22

matyas

2,696
23
29

score 0 · Answer 5 · answered Apr 04 '22 at 06:34

If you don't set limits.ephemeral-storage, requests.ephemeral-storage, by default pods have permission to use all node's storage space.
So, you can set limits.ephemeral-storage, requests.ephemeral-storage

apiVersion: v1
kind: Pod
metadata:
  name: frontend
spec:
  containers:
  - name: app
    image: images.my-company.example/app:v4
    resources:
      requests:
        ephemeral-storage: "2Gi"
      limits:
        ephemeral-storage: "4Gi"

Or, configure the Docker logging driver to limit the amount of stored logs (in the file /etc/docker/daemon.json, by default this file doesn't exist, you must create it):

{
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "2"
}
}

score -3 · Answer 6 · answered Jun 24 '20 at 12:26

-3

You can increase the size of the EBS volume which is attached and restart the EC2 instance to get that effect.

answered Jun 24 '20 at 12:26

Hemal Ekanayake

21
1

10

Can you post some information on how to achieve this? It would help strengthen the answer. – ryanwebjackson Jun 24 '20 at 15:51

score -4 · Answer 7 · edited Feb 17 '22 at 09:50

-4

Please consider following factors:

Application that you are deploying via Kubernetes should have limits and requests set for memory and CPU in manifest file.
As per your application requirements you should have your nodes configured in Kubernetes cluster.
Increase no of nodes if all of them are heavily used by apps.

edited Feb 17 '22 at 09:50

Harsh Manvar

27,020
6
48
102

answered Feb 17 '22 at 09:17

kalyani chaudhari

7,515
3
24
21

1

This question is about pods being evicted due to a shortage of ephemeral storage, not CPU or memory. – merrr Nov 25 '22 at 06:11

The node was low on resource: ephemeral-storage

7 Answers7

Linked