0

I am trying to write a data process unit in kubernetes.

For every process unit has a quite similar workflow:

  1. Puller pull data from object storage and mount an /input volume to container
  2. Processor run the code to process data in volume and output data to /output volume
  3. Pusher push data in /output volume to object storage again

So every pod or job must have a container as data pusher and data puller which is mentioned in here by shared volume. But how can i make the process as pull -> process -> push sequence?

Right now I can use volume share way to communication to make it work: first I can let puller start working and let data processor wait until it find a pull-finished.txt created. Then let the pusher start working when it find a process-finished.txt created. But this may have to force the data process container FROM some image or use some specific entrypoint which is not what I want. Is there a more elegant way to make this work?

Jonas
  • 121,568
  • 97
  • 310
  • 388
aisensiy
  • 1,460
  • 3
  • 26
  • 42
  • Hi, Afaik, you could use init container to do these tasks. Besides it could be synchronized in a pod. here is the link of it [init-containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) – Suresh Vishnoi Jun 11 '18 at 16:55
  • Not exactly the same question, but it has the information you need: https://stackoverflow.com/questions/49568337/kubernetes-processing-an-unlimited-number-of-work-items/49619517#49619517 – Janos Lenart Jun 12 '18 at 10:41

1 Answers1

0

As already mentioned in the comments by Suresh Vishnoi and Janos Lenart, the best approach is to use Jobs for processing data from queue or input volume, and init-containers to have sequential steps to process the data.

Here is a good example of using init-containers from Kubernetes documentation:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
  - name: init-mydb
    image: busybox
    command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']

Another good example you can find in the answer provided by Janos Lenart

VAS
  • 8,538
  • 1
  • 28
  • 39
  • So the containers part is just a notifier to say that all job is done. I have to use initContainers to make sure all jobs run in sequence? – aisensiy Jun 13 '18 at 13:43