8

I keep getting this error when I try to setup liveness & readiness prob for my awx_web container

Liveness probe failed: Get http://POD_IP:8052/: dial tcp POD_IP:8052: connect: connection refused

Liveness & Readiness section in my deployment for the container awx_web

          ports:
          - name: http
            containerPort: 8052 # the port of the container awx_web
            protocol: TCP
          livenessProbe:
            httpGet:
              path: /
              port: 8052
            initialDelaySeconds: 5
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /
              port: 8052
            initialDelaySeconds: 5
            periodSeconds: 5

if I test if the port 8052 is open or not from another pod in the same namespace as the pod that contains the container awx_web or if I test using a container deployed in the same pod as the container awx_web i get this (port is open)

/ # nc -vz POD_IP 8052
POD_IP  (POD_IP :8052) open

I get the same result (port 8052 is open) if I use netcat (nc) from the worker node where pod containing the container awx_web is deployed.

for info I use a NodePort service that redirect traffic to that container (awx_web)

type: NodePort
ports:
- name: http
  port: 80
  targetPort: 8052
  nodePort: 30100
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
Abderrahmane
  • 385
  • 2
  • 3
  • 14

3 Answers3

11

I recreated your issue and it looks like your problem is caused by too small value of initialDelaySeconds for the liveness probe.

It takes more than 5s for awx container to open 8052 port. You need to wait a bit longer for it to start. I have found out that setting it to 15s is enough for me, but you may require some tweaking.

Matt
  • 7,419
  • 1
  • 11
  • 22
  • i already incremented initialDelaySeconds to 30s then to 60s but still the same issue – Abderrahmane Sep 16 '20 at 11:57
  • With liveness probe being set, could you exec to pod as soon as it starts and run `watch -n1 "ss -lnt"` and check when port 8052 opens? – Matt Sep 16 '20 at 13:03
  • the container has `State: Running` and `Ready: False`, when i issue your command the port 8052 is missing from the list – Abderrahmane Sep 16 '20 at 13:31
  • Is it always missing or maybe it appears after some time? Also please check logs `kubectl logs -n `, maybe there are some errors.@Adamsin – Matt Sep 16 '20 at 13:42
  • Do you have any other ports open? In my case that is 8050 and also 8051. And what version of awx are you using (I was testing on 14.1.0)? – Matt Sep 16 '20 at 13:47
  • regarding pod logs there is nothing noticeable, for AWX i use 9.3.0 version and for the ports you mentioned i see them when there is no probes but i don't see them if the probes are in place – Abderrahmane Sep 16 '20 at 14:05
  • 2
    I deployed awx 9.3.0 and it looks like it takes awx-web container whole 5min before it opens port 8052 and starts serving traffic. This is why liveness probe is failing. Check it yourself; remove the probes, exec to the container, `watch ss -lnt`, and measure the time since the pods start to port 8052 is open. – Matt Sep 16 '20 at 14:55
  • your are right, exactly as you said but i don't understand why it takes so much time for the container to start listening on the designated port when i use probes while it takes only few seconds 10-20s without probes ?! – Abderrahmane Sep 17 '20 at 07:30
  • I can't help you with this. You can create an issue on [awx github repo](https://github.com/ansible/awx/issues) asking this question to developers directly, but I am not sure version 9.x.x is still supported so you may not get the answer. – Matt Sep 17 '20 at 07:37
11

In my case this issue has occurred because I've configured the backend application host as localhost. The issue is resolved when I changed the host value to 0.0.0.0 inside my app properties.

Use the latest built docker image after making this change.

Vivek
  • 11,938
  • 19
  • 92
  • 127
-1

Most likely your application couldnt startup or crash little after it start up . It may due to insufficient memory and cpu resource. Or one of the awx dependency not setup correctly like postgreslq & rabbit.

Did you check that if your application works correctly without probes? I recommend do that first. Examine the pods stats little bit to ensure its not restart.

Luffy
  • 2,257
  • 16
  • 24