How to kill a multi-container pod if one container fails?

Question

I'm using Jenkins Kubernetes Plugin which starts Pods in a Kubernetes Cluster which serve as Jenkins agents. The pods contain 3 containers in order to provide the slave logic, a Docker socket as well as the gcloud command line tool.

The usual workflow is that the slave does its job and notifies the master that it completed. Then the master terminates the pod. However, if the slave container crashes due to a lost network connection, the container terminates with error code 255, the other two containers keep running and so does the pod. This is a problem because the pods have large CPU requests and setup is cheap with the slave running only when they have to, but having multiple machines running for 24h or over the weekend is a noticable financial damage.

I'm aware that starting multiple containers in the same pod is not fine Kubernetes arts, however ok if I know what I'm doing and I assume I do. I'm sure it's hard to solve this differently given the way the Jenkins Kubernetes Plugin works.

Can I make the pod terminate if one container fails without it respawn? As solution with a timeout is acceptable as well, however less preferred.

score 3 · Answer 1 · answered Feb 10 '20 at 14:00

Disclaimer, I have a rather limited knowledge of kubernetes, but given the question:

Maybe you can run the forth container that exposes one simple endpoint of "liveness" It can run ps -ef or any other way to contact 3 existing containers just to make sure they're alive.

This endpoint could return "OK" only if all the containers are running, and "ERROR" if at least one of them was detected as "crushed"

Then you could setup a liveness probe of kubernetes so that it would stop the pod upon the error returned from that forth container.

Of course if this 4th process will crash by itself for any reason (well it shouldn't unless there is a bug or something) then the liveness probe won't respond and kubernetes is supposed to stop the pod anyway, which is probably what you really want to achieve.

Interesting idea that put me on a good path, thank you. The feature of combining liveness probes (which would avoid the extra pod) is discussed at https://github.com/kubernetes/kubernetes/issues/37218. — Kalle Richter, Feb 10 '20 at 21:49
Nice discussion. Thanks for the link, Karl. I wasn't aware of it... — Mark Bramnik, Feb 10 '20 at 22:14

How to kill a multi-container pod if one container fails?

1 Answers1