0

I have a service set up in Kubernetes which seems to be a fairly normal: deployment, service, and HPA. However, it does something which I'd like to fix. The sequence of events goes like this:

  1. We change the deployed image, which creates new pods.
  2. The pods become healthy and enter the service through the label selector.
  3. The HPA enters an unhealthy state because it cannot read the new pod metrics.
  4. I get notified through Argo rollouts that the HPA is unhealthy.

I'd like to somehow delay pods entering service until their metrics are ready so we don't get this false alarm on every deploy.

Right now, we solve this by waiting 60 seconds before changing the labels in our blue/green rollout script, but that's pretty unsatisfying!

I think I could also do this by creating a liveness probe that asked for the pod's metrics, but it seems like a lot of hassle for something that seems like it should be easy. (for example, it doesn't look like I have the current namespace in the environment by default. I guess I could get it with the downward API, but I'd also have to bundle curl or kubectl in my container images even if I had it, which I'd prefer not to do.)

Anyway, are other people even seeing this? If so, how are you solving it?


Editing to add information requested in a comment: we use Kubernetes 1.21 on an Amazon EKS cluster.

Brian Hicks
  • 6,213
  • 8
  • 51
  • 77
  • Which version of Kubernetes did you use and how did you set up the cluster? Did you use bare metal installation or some cloud provider? – kkopczak Dec 07 '21 at 08:59
  • I've added the info in above! – Brian Hicks Dec 08 '21 at 11:29
  • Sorry for long time to response. Thank you for the update - could you also provide some logs? – kkopczak Dec 14 '21 at 10:37
  • there are a lot of logs in Kubernetes—is there something in particular you're looking for? My question, however, is more "is this even possible or reasonable?" and less "why isn't it working, help!" – Brian Hicks Dec 14 '21 at 15:52
  • Could you attach some yaml files? Also check [this gide](https://stackoverflow.com/help/minimal-reproducible-example) (minimal reproduce example) - please elaborate your question. It will be helpful to reproduce your problem. – kkopczak Dec 20 '21 at 09:25

0 Answers0