I'm using Jenkins Kubernetes Plugin which starts Pods in a Kubernetes Cluster which serve as Jenkins agents. The pods contain 3 containers in order to provide the slave logic, a Docker socket as well as the gcloud
command line tool.
The usual workflow is that the slave does its job and notifies the master that it completed. Then the master terminates the pod. However, if the slave container crashes due to a lost network connection, the container terminates with error code 255, the other two containers keep running and so does the pod. This is a problem because the pods have large CPU requests and setup is cheap with the slave running only when they have to, but having multiple machines running for 24h or over the weekend is a noticable financial damage.
I'm aware that starting multiple containers in the same pod is not fine Kubernetes arts, however ok if I know what I'm doing and I assume I do. I'm sure it's hard to solve this differently given the way the Jenkins Kubernetes Plugin works.
Can I make the pod terminate if one container fails without it respawn? As solution with a timeout is acceptable as well, however less preferred.