2

We want to build a simplified job/task processing system based on Kubernetes. We thought about using Knative and its eventing features. However, one requirement is to execute each task/job isolated in a separate pod. Afterwards, we wanna destroy the pod. Every other task/job is processed by new pods, etc. Further, the jobs/tasks can be long-running, i.e., multiple hours or even days.

We are wondering if we can use and configure Knative to achieve this. I'm actually a bit sceptical due to the scale to zero feature, which would destroy long-running jobs (learned from here: https://stackoverflow.com/a/65881346/7065173). Further, our jobs/tasks shouldn't necessarily listen to an HTTP(S) port. These jobs/tasks are basically pre-packaged into a container and the respective action is executed using Docker CMD.

What do you guys think, is Knative a good baseline for our endeavour? ... Even more, do you have any tip/suggestion what baseline to use instead (we also have an eye on Tekton btw.)?

Michael Wurster
  • 158
  • 1
  • 8

2 Answers2

2

If you have tasks you want to run for days, then Knative is probably not a good baseline for the effort. Knative assumes that your application is only active as long as there is at least one HTTP request in flight to your application. As you intuit, leaving an HTTP connection open for days is probably not a good design practice.

For your use case, it seems like Kubernetes Jobs might be the best approach. If you need something to react to the "there is work to spin up a job" signal and create the Job, you could use a Knative Service to talk to Kubernetes to create the Job; I've seen that work successfully in other cases.

Knative also doesn't provide a hard mechanism for "exit after processing one request"; for users that want this level of isolation, I've suggested putting an exit(1) call in their application after they handle one request, but I agree that it's not an ideal workaround.

E. Anderson
  • 3,405
  • 1
  • 16
  • 19
  • We really need an "exit after processing one request" but the exit(1) you suggested seems not to work. The pod is still there and the next request performs a restart of the pod without really starting a fresh instance. Any idea? – Spenhouet Sep 19 '22 at 13:51
  • It looks like Kubernetes has "improved" the Pod lifecycle by enabling the kubelet to restart individual containers without restarting the entire Pod in this case. – E. Anderson Sep 21 '22 at 22:08
  • This seems like a reasonable feature request in https://github.com/knative/serving – E. Anderson Sep 21 '22 at 22:09
  • We are considering a feature request but need to be careful with the wording since similar request (from others) in the past were turned down with a "why would anyone want this" attitude or a deflective "then just use k8s jobs" response. Basically we want a job execution via HTTP requests and Knative seemed to be the right tool but turns out they implemented everything with k8s deployments and this really does not work well with a job like use case. Most likely this feature request would require a separate implementation and they simply might be unwilling to introduce that. We will try. – Spenhouet Sep 22 '22 at 09:52
  • Btw. while we could not run our pods how we want we are using pod preemptions now which does solve our pressing issue of non exiting pods blocking GPU resources. – Spenhouet Sep 22 '22 at 09:56
2

Tekton is pretty much purpose built for this scenario. We use it for exactly this. I agree with the other answer, knative is not a great fit for this use case (and I do love knative).

Trey
  • 11,032
  • 1
  • 23
  • 21