Data puller and data pusher in pod or job

Question

I am trying to write a data process unit in kubernetes.

For every process unit has a quite similar workflow:

Puller pull data from object storage and mount an /input volume to container
Processor run the code to process data in volume and output data to /output volume
Pusher push data in /output volume to object storage again

So every pod or job must have a container as data pusher and data puller which is mentioned in here by shared volume. But how can i make the process as pull -> process -> push sequence?

Right now I can use volume share way to communication to make it work: first I can let puller start working and let data processor wait until it find a pull-finished.txt created. Then let the pusher start working when it find a process-finished.txt created. But this may have to force the data process container FROM some image or use some specific entrypoint which is not what I want. Is there a more elegant way to make this work?

Hi, Afaik, you could use init container to do these tasks. Besides it could be synchronized in a pod. here is the link of it [init-containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) — Suresh Vishnoi, Jun 11 '18 at 16:55
Not exactly the same question, but it has the information you need: https://stackoverflow.com/questions/49568337/kubernetes-processing-an-unlimited-number-of-work-items/49619517#49619517 — Janos Lenart, Jun 12 '18 at 10:41

score 0 · Answer 1 · answered Jun 12 '18 at 15:45

As already mentioned in the comments by Suresh Vishnoi and Janos Lenart, the best approach is to use Jobs for processing data from queue or input volume, and init-containers to have sequential steps to process the data.

Here is a good example of using init-containers from Kubernetes documentation:

apiVersion: v1
kind: Pod
metadata:
  name: myapp-pod
  labels:
    app: myapp
spec:
  containers:
  - name: myapp-container
    image: busybox
    command: ['sh', '-c', 'echo The app is running! && sleep 3600']
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
  - name: init-mydb
    image: busybox
    command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']

Another good example you can find in the answer provided by Janos Lenart

So the containers part is just a notifier to say that all job is done. I have to use initContainers to make sure all jobs run in sequence? — aisensiy, Jun 13 '18 at 13:43

Data puller and data pusher in pod or job

1 Answers1