0

I have a Kubernetes cluster on AWS, set up with kops.

I set up a Deployment that runs an Apache container and a Service for the Deployment (type: LoadBalancer).

When I update the deployment by running kubectl set image ..., as soon as the first pod of the new ReplicaSet becomes ready, the first couple of requests to the service time out.

Things I have tried:

  • I set up a readinessProbe on the pod, works.
  • I ran curl localhost on a pod, works.
  • I performed a DNS lookup for the service, works.
  • If I curl the IP returned by that DNS lookup inside a pod, the first request will timeout. This tells me it's not an ELB issue.

It's really frustrating since otherwise our Kubernetes stack is working great, but every time we deploy our application we run the risk of a user timing out on a request.

gzzo
  • 241
  • 5
  • 12
  • so it fails when you try to use service IP, what if you curl the pod IP (not localhost) ? is this a universal issue regardless of what you run in the container (ie. static nginx page) or only for this apache image ? – Radek 'Goblin' Pieczonka Nov 22 '17 at 08:13

1 Answers1

0

After a lot of debugging, I think I've solved this issue.

TL;DR; Apache has to exit gracefully.

I found a couple of related issues:

Some more things I tried:

  • Increase the KeepAliveTimeout on Apache, didn't help.
  • Ran curl on the pod IP and node IPs, worked normally.
  • Set up an externalName selector-less service for a couple of external dependencies, thinking it might have something to do with DNS lookups, didn't help.

The solution:

I set up a preStop lifecycle hook on the pod to gracefully terminate Apache to run apachectl -k graceful-stop

The issue (at least from what I can tell), is that when pods are taken down on a deployment, they receive a TERM signal, which causes apache to immediately kill all of its children. This might cause a race condition where kube-proxy still sends some traffic to pods that have received a TERM signal but not terminated completely.

Also got some help from this blog post on how to set up the hook.

I also recommend increasing the terminationGracePeriodSeconds in the PodSpec so apache has enough time to exit gracefully.

gzzo
  • 241
  • 5
  • 12