4

I would like to provide DAGs to all Kubernetes airflow pods (web, scheduler, workers) via a persistent volume,

kubectl create -f pv-claim.yaml

pv-claim.yaml containing:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: airflow-pv-claim
  annotations:
    pv.beta.kubernetes.io/gid: "1000"
    pv.beta.kubernetes.io/uid: "1000"
spec:
  storageClassName: standard
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

The deployment command is then:

helm install --namespace my_name --name "airflow" stable/airflow --values ~my_name/airflow/charts/airflow/values.yaml

In the chart stable/airflow, values.yaml also allows for specification of persistence:

persistence:
  enabled: true
  existingClaim: airflow-pv-claim
  accessMode: ReadWriteMany
  size: 1Gi

But if I do

kubectl exec -it airflow-worker-0 -- /bin/bash
touch dags/hello.txt

I get a permission denied error.

I have tried hacking the airflow chart to set up an initContainer to chown dags/:

command: ["sh", "-c", "chown -R 1000:1000 /dags"]

which is working for all but the workers (because they are created by flower?), as suggested at https://serverfault.com/a/907160/464205

I have also seen talk of fsGroup etc. - see e.g. Kubernetes NFS persistent volumes permission denied

I am trying to avoid editing the airflow charts (which seems to require hacks to at least two deployments-*.yaml files, plus one other), but perhaps this is unavoidable.

Punchline:

What is the easiest way to provision DAGs through a persistent volume to all airflow pods running on Kubernetes, with the correct permissions?

See also:

Persistent volume atached to k8s pod group

Kubernetes NFS persistent volumes permission denied [not clear to me how to integrate this with the airflow helm charts]

Kubernetes - setting custom permissions/file ownership per volume (and not per pod) [non-detailed, non-airflow-specific]

jtlz2
  • 7,700
  • 9
  • 64
  • 114

1 Answers1

1

It turns out you do, I think, have to edit the airflow charts, by adding the following block in deployments-web.yaml and deployments-scheduler.yaml under spec.template.spec:

kind: Deployment
spec:
  template:
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        fsUser: 1000

This allows one to get dags into airflow using e.g.

kubectl cp my_dag.py my_namespace/airflow-worker-0:/usr/local/airflow/dags/
jtlz2
  • 7,700
  • 9
  • 64
  • 114
  • That is very strange. How come the init container hack doesn't work but defining the security context does? Would perhaps setting it up to run the init container with workers do the trick? – alex Aug 25 '20 at 12:32
  • 1
    Kinda confused why you set the user/group to 1000. When I exec into the scheduler pod and type `id` it says `uid=50000(airflow) gid=50000(airflow) groups=50000(airflow)`. Why didn't you set to 50000? – alex Aug 25 '20 at 14:56