2

I'm using Spark 2.4.5 to run a spark application on kubernetes through the spark-submit command. The application fails while trying to write outputs as detailed here, probably due to an issue with an incorrect security context. So I tried setting up a security context and running the application. I did this by creating a pod template as mentioned here, but I haven't been able to validate if the pod template has been set up properly (because I couldn't find proper examples), or if it's accessible from the driver and executor pods (since I couldn't find anything related to the template in the driver or kubernetes logs). This is the content of the pod template I used to set a security context.

apiVersion: v1
kind: Pod
metadata:
 name: spark-pod-template
spec:
  securityContext:
    runAsUser: 1000

This is the command I used.

 <SPARK_PATH>/bin/spark-submit --master k8s://https://dssparkcluster-dns-fa326f6a.hcp.southcentralus.azmk8s.io:443 \
 --deploy-mode cluster  --name spark-pi3 --conf spark.executor.instances=2 \
 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
 --conf spark.kubernetes.container.image=docker.io/datamechanics/spark:2.4.5-hadoop-3.1.0-java-8-scala-2.11-python-3.7-dm14 \
 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
 --conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
 --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
 --conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
 --conf spark.kubernetes.driver.podTemplateFile=/opt/spark/work-dir/spark_pod_template.yml \
 --conf spark.kubernetes.executor.podTemplateFile=/opt/spark/work-dir/spark_pod_template.yml \
 --verbose /opt/spark/work-dir/wordcount2.py

I've placed the pod template file in a persistent volume mounted at /opt/spark/work-dir. The questions I have are:

  1. Is the pod template file accessible from the persistent volume?
  2. Are the file contents in the appropriate format for setting a runAsUser?
  3. Is the pod template functionality supported for Spark 2.4.5? Although it is mentioned in the 2.4.5 docs that security contexts can be implemented using pod templates, there is no pod template section as in the 3.2.0 docs.

Any help would be greatly appreciated. Thanks.

Maaverik
  • 161
  • 3

1 Answers1

0

As you can read at https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template:

spark.kubernetes.executor.podTemplateFile to point to files accessible to the spark-submit process.

So here, there is nothing about PVC, but local FS where spark-submit is sent. BTW, you can check by inspecting generated pod if this is working.

A good pod & container security-context:

      securityContext:
        fsGroup: 1000
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 1000
      containers:
        - name: spark
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            readOnlyRootFilesystem: true
Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124