5

I am having difficulty getting a kubernetes livenessProbe exec command to work with environment variables. My goal is for the liveness probe to monitor memory usage on the pod as well as also perform an httpGet health check.

"If container memory usage exceeds 90% of the resource limits OR the http response code at /health fails then the probe should fail."

The liveness probe is configured as follows:


livenessProbe:
  exec:
    command:
    - sh
    - -c
    - |-
      "used=$(awk '{ print int($1/1.049e+6) }' /sys/fs/cgroup/memory/memory.usage_in_bytes);
      thresh=$(awk '{ print int( $1 / 1.049e+6 * 0.9 ) }' /sys/fs/cgroup/memory/memory.limit_in_bytes);
      health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/health);
      if [[ ${used} -gt ${thresh} || ${health} -ne 200 ]]; then exit 1; fi"
  initialDelaySeconds: 240
  periodSeconds: 60
  failureThreshold: 3
  timeoutSeconds: 10

If I exec into the (ubuntu) pod and run these commands they all work fine and do the job.

But when deployed as a livenessProbe the pod is constantly failing with the following warning:

Events:                                                                                                                                                                                                               │
│   Type     Reason     Age                  From     Message                                                                                                                                                           │
│   ----     ------     ----                 ----     -------                                                                                                                                                           │
│   Warning  Unhealthy  14m (x60 over 159m)  kubelet  (combined from similar events): Liveness probe failed: sh: 4: used=1608;                                                                                          │
│ thresh=2249;                                                                                                                                                                                                          │
│ health=200;                                                                                                                                                                                                           │
│ if [[  -gt  ||  -ne 200 ]]; then exit 1; fi: not found

It looks as if the initial commands to probe memory and curl the health check endpoint all worked and populated environment variables but then those variable substitutions did not subsequently populate in the if statement so the probe never passes.

Any idea as to why? Or how this could be configured to work properly? I know it's a little bit convoluted. Thanks in advance.

david_beauchamp
  • 161
  • 1
  • 7

3 Answers3

7

Looks like the shell is seeing your whole command as a filename to execute.

I would remove the outer quotes

livenessProbe:
  exec:
    command:
    - sh
    - -c
    - |-
      used=$(awk '{ print int($1/1.049e+6) }' /sys/fs/cgroup/memory/memory.usage_in_bytes);
      thresh=$(awk '{ print int( $1 / 1.049e+6 * 0.9 ) }' /sys/fs/cgroup/memory/memory.limit_in_bytes);
      health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/health);
      if [[ ${used} -gt ${thresh} || ${health} -ne 200 ]]; then exit 1; fi
  initialDelaySeconds: 240
  periodSeconds: 60
  failureThreshold: 3
  timeoutSeconds: 10

You're already telling the YAML parser it's a multiline string

Andrew McGuinness
  • 2,092
  • 13
  • 18
4

I think the root of your issues is the confusion between bash and sh (shell). Both are widely available in containers (but bash is sometimes not present) but bash has more features. Here you use [[ which is specific to bash, sh does not know it and may cause unwanted behavior.

First replace sh by bash in your command if it is present in the container. If not you will have to use shell syntax to do conditional commands.

Then your liveness probe can be perfected by leveraging other Kubernetes features:

  • To avoid a big initial delay, use a Startup probe. It will disable other probes until it responds with one success and should have a high failureThreshold. It allows flexibility in case the container starts faster than expected and centralize the delay (which means no value duplication) when you add an other probe.

  • Use the resources field. It allows you to specify memory and CPU limits and requests (read the documentation) for a specific deployment or pod. Because failing the liveness probe means that your pod will be restarted, setting a limit will do the same thing but cleaner.

OreOP
  • 122
  • 1
  • 5
2

It turns out that both answers by @Andrew McGuinness AND @OreOP were crucial to my final properly working solution which was:

  livenessProbe:
    exec:
      command:
      - /bin/bash
      - -c
      - |-
        used=$(awk '{ print int($1/1.049e+6) }' /sys/fs/cgroup/memory/memory.usage_in_bytes);
        thresh=$(awk '{ print int( $1 / 1.049e+6 * 0.9 ) }' /sys/fs/cgroup/memory/memory.limit_in_bytes);
        health=$(curl -s -o /dev/null --write-out "%{http_code}" http://localhost:8080/health);
        if [[ ${used} -gt ${thresh} || ${health} -ne 200 ]]; then exit 1; fi
    initialDelaySeconds: 240
    periodSeconds: 60
    failureThreshold: 3
    timeoutSeconds: 10

I crucially needed Andrews advice about removing the quotes because I was already instucting yaml parser that this was a multi-line string. I think that was actually what I was asking. But @OreOP was absolutely correct about my confusion between bash and sh and which one would accept a double bracket [[ conditional ]] statement.

By the way, I completely agree with both that this isn't ultimately the correct solution to the deeper problem at hand but for various other reasons my team has requested this patch as a temporary measure. The memory.limit_in_bytes in my script is actually referencing the resource limits set in my k8s deployment yaml.

david_beauchamp
  • 161
  • 1
  • 7