0

I have Java pod that after few days is restarted.

Looking at kubectl describe pod ... there is just the following:

Last State:     Terminated
  Reason:       Error
  Exit Code:    137

This message, in my experience, usually means that I have an OutOfMemoryError somewhere but looking at the log I don't see anything useful.

Is there a way to execute a script (or save few files) just before the inevitable restart? Something that could help me to identify the problem.

For example: in case the restart was caused by an OutOfMemoryError, would be wonderful if I could save the memory dump or the garbage collection logs.

freedev
  • 25,946
  • 8
  • 108
  • 125
  • 1
    See https://docs.oracle.com/en/java/javase/17/docs/specs/man/java.html Look at the -XX:OnOutOfMemoryError option – tgdavies Aug 10 '22 at 22:36
  • @tgdavies thanks, I had a look but I was unable to use that options because the pod was restarter before I could do anything. – freedev Aug 10 '22 at 23:08
  • Probably not an OOM exception then. Maybe the JVM is trying to allocate more memory than the pod has available? Try running with a small er heap size. – tgdavies Aug 10 '22 at 23:27

2 Answers2

1

There is some solutions to do that:

  • you can mount a volume to your application, and configure log4j to write the log to a file in the volume, so the log will be persistent
  • the best solution is using a log collector (fluentd, logstash) to save the log in Elastic Search or a S3 file, or using a managed service like AWS cloudWatch, datadog, ...

To solve the problem of OOM, you can add a big request memory to your application (2-4G), then you can watch the memory usage by using top command or a tool to monitor your cluster (ex prometheus):

apiVersion: v1
kind: Pod
metadata:
  name: ...
spec:
  containers:
  - name: ...
    image: ...
    resources:
      requests:
        memory: "2G"
freedev
  • 25,946
  • 8
  • 108
  • 125
Hussein Awala
  • 4,285
  • 2
  • 9
  • 23
  • Argh. `The Deployment ... is invalid: spec.template.spec.restartPolicy: Unsupported value: "Never": supported values: "Always"` – freedev Aug 10 '22 at 23:46
  • Please read https://stackoverflow.com/questions/55169075/restartpolicy-unsupported-value-never-supported-values-always – freedev Aug 11 '22 at 00:07
  • In fact, the deployment doesn't support the `Never` restart policy, but the pod does, so you can run your application as a simple pod (`kubectl get pod ... -o yaml > pod.yml`, then update the value for `restartPolicy` and remove the owner info, then apply it with a new name), or just update your question maybe someone else has an answer for the deployment – Hussein Awala Aug 12 '22 at 11:18
0

I found the below two ways to investigate the out of memory error in the Kubernetes. Either it would be best if you had a logging solution that will keep the logs or you can use --previous run to read the logs, which I generally use for debugging until it is the same pod that is in crashloop.

Write thread to stdout of the pod

You can take advantage of the lifecycle hook, and take a thread dump and write to stdout, so you will be able to see at k logs -f pod_name -c container_name --previous

          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "jcmd 1 Thread.print > /proc/1/fd/1"]          

pod-lifecycle

This will also help if you write dumb logs to Datadog or Elasticsearch.

Writing to volume

you will need to update the java command or env and deployment chart.

      serviceAccountName: {{ include "helm-chart.fullname" . }}
      volumes:
        - name: heap-dumps
          emptyDir: {}
      containers:
        - name: java-container
          volumeMounts:
          - name: heap-dumps
            mountPath: /dumps

add this env

ENV JAVA_OPTS="-XX:+CrashOnOutOfMemoryError  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps/oom.bin -Djava.io.tmpdir=/tmp"

you will be able to see what's going on in the JVM.

Some more config regarding JVM in the container that can help you to utlizie the advance option of the JVM running inside container.

-XX:InitialRAMPercentage=50.0 -XX:MinRAMPercentage=50.0 -XX:MaxRAMPercentage=85.0

he JVM has been modified to be aware that it is running in a Docker container and will extract container specific configuration information instead of querying the operating system.

jvm-in-a-container

best-practices-java-memory-arguments-for-containers

Adiii
  • 54,482
  • 7
  • 145
  • 148
  • Hi @Adiii, thanks for sharing. Not clear how I can mount a volume in a kube deployment. Could you please elaborate this better? – freedev Aug 11 '22 at 16:38
  • I already mentioined in the section `Writing to volume` – Adiii Aug 11 '22 at 16:52
  • https://kubernetes.io/docs/concepts/storage/volumes/ for furhter details you can look into this, but a i mentioned for quick around i will go for the thread one – Adiii Aug 11 '22 at 16:53
  • Dump usually very heavy and hard to find , although there are tool but thread to stdout pretty straight forward and you can easily check which function trigger the OOM – Adiii Aug 11 '22 at 16:54