3

I have a Python process that does some heavy computations with Pandas and such - not my code so I basically don't have much knowledge on this.

The situation is this Python code used to run perfectly fine on a server with 8GB of RAM maxing all the resources available.

We moved this code to Kubernetes and we can't make it run: even increasing the allocated resources up to 40GB, this process is greedy and will inevitably try to get as much as it can until it gets over the container limit and gets killed by Kubernetes.

I know this code is probably suboptimal and needs rework on its own.

However my question is how to get Docker on Kubernetes mimic what Linux did on the server: give as much as resources as needed by the process without killing it?

Nicolas Landier
  • 140
  • 1
  • 9
  • 1
    have you set limits in your deployment .yaml? – Will Jenkins Nov 18 '20 at 10:24
  • The description of the problem is not really clear... are you using the same input in both cases? If that is the case, then the python script is probably checking the available ram and trying to manage it... and I would say that the way the script checks the RAM is not working with k8s. If you are not using the same input.... well then try with the same input on both systems and check if it works in one and not in the other... – Riccardo Petraglia Nov 18 '20 at 10:32
  • Yes I have set limits in the deployment to make sure it would not eat 100% of the host and the limits is used by Kubernetes to kill the container. – Nicolas Landier Nov 18 '20 at 10:35
  • 1
    as will pointed out this seems like a situation where Kubernetes is killing a pod as it reaches the maximum limit of the resource.. https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/ also Riccardo's point is quite valid. – Rahul Nov 18 '20 at 10:36
  • Yes the Python script is 100% the same: same input, same systems it talks to... only difference is EC2 Vs Kubernetes. I guessed indeed that somehow the Python process looks up for the host available memory and try to use it which is wrong in Kubernetes environment since it does not take into account what the container is allowed to use. I would expect it to work the other way around without knowledge of what is available but just asks kindly to the OS if it can get more. I'm fairly new to Kubernetes so I may mis-interpret things. – Nicolas Landier Nov 18 '20 at 10:38
  • 1
    Is there anything in the logs? Any errors like [here](https://stackoverflow.com/questions/49440125/getting-killed-error-in-aws-but-works-in-mac)? Do I understand correctly that the pod itself is working, but it takes more and more memory to 40GB and then it's get killed? – Jakub Nov 19 '20 at 07:40
  • Yes, you're correct. The Kubernetes logs just confirm the container is OOM. In Docker on a MacBook the process is getting a SIGKILL when reaching the max memory allocated to Docker. – Nicolas Landier Nov 19 '20 at 12:43
  • I have the same issue... it seems that python is not considering `/sys/fs/cgroup/memory/memory.max_usage_in_bytes` as its max memory, but the host memory instead. When it tries to allocate more memory than it is allowed, linux oom killer handles it. On Java we have a `-XX:+UseContainerSupport`, I couldn't find anything similar for Python though. – caarlos0 Nov 19 '20 at 13:46

2 Answers2

3

I found out that running something like this seems to work:

import resource
import os

if os.path.isfile('/sys/fs/cgroup/memory/memory.limit_in_bytes'):
    with open('/sys/fs/cgroup/memory/memory.limit_in_bytes') as limit:
        mem = int(limit.read())
        resource.setrlimit(resource.RLIMIT_AS, (mem, mem))

This reads the memory limit file from cgroups and set it as both hard and soft limit for the process' max area address space.

You can test by runnning something like:

docker run --it --rm -m 1G --cpus 1 python:rc-alpine

And then trying to allocate 1G of ram before and after running the script above.

With the script, you'll get a MemoryError, without it the container will be killed.

caarlos0
  • 20,020
  • 27
  • 85
  • 160
  • Thanks! I already tried simply setting a RLIMIT_AS but it ended killing the container the same way as if the process was reaching the container memory limit. I'll give a try to what you did. – Nicolas Landier Nov 20 '20 at 09:34
  • It may still die, but it will die with a Python `MemoryError` instead of being OOMKilled... – caarlos0 Nov 20 '20 at 17:17
0

Using --oom-kill-disable option with a memory limit works for me (12GB memory) in a Docker container. Perhaps it applies to Kubernetes as well.

docker run -dp 80:8501 --oom-kill-disable -m 12g <image_name> 

Hence How to mimic "--oom-kill-disable=true" in kuberenetes?

berkorbay
  • 443
  • 7
  • 22