chmod error while writing outputs with Spark on Kubernetes

Question

I'm working on a POC for getting a Spark cluster set up to use Kubernetes for resource management using AKS (Azure Kubernetes Service). I'm using spark-submit to submit pyspark applications to k8s in cluster mode and I've been successful in getting applications to run fine.

I got Azure file share set up to store application scripts and Persistent Volume and a Persistent Volume Claim pointing to this file share to allow Spark to access the scripts from Kubernetes. This works fine for applications that don't write any output, like the pi.py example given in the spark source code, but writing any kind of outputs fails in this setup. I tried running a script to get wordcounts and the line

wordCounts.saveAsTextFile(f"./output/counts")

causes an exception where wordCounts is an rdd.

Traceback (most recent call last):
File "/opt/spark/work-dir/wordcount2.py", line 14, in <module>
wordCounts.saveAsTextFile(f"./output/counts")
File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1570, in saveAsTextFile
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o65.saveAsTextFile.
: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted

The directory "counts" has been created from the spark application just fine, so it seems like it has required permissions, but this subsequent chmod that spark tries to perform internally fails. I haven't been able to figure out the cause and what exact configuration I'm missing in my commands that's causing this. Any help would be greatly appreciated.

The kubectl version I'm using is

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"881d4a5a3c0f4036c714cfb601b377c4c72de543", GitTreeState:"clean", BuildDate:"2021-10-21T05:13:01Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

The spark version is 2.4.5 and the command I'm using is

<SPARK_PATH>/bin/spark-submit --master k8s://<HOST>:443  \
--deploy-mode cluster  \
--name spark-pi3 \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=docker.io/datamechanics/spark:2.4.5-hadoop-3.1.0-java-8-scala-2.11-python-3.7-dm14  \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.options.claimName=azure-fileshare-pvc \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.azure-fileshare-pvc.mount.path=/opt/spark/work-dir \
--verbose /opt/spark/work-dir/wordcount2.py

The PV and PVC are pretty basic. The PV yml is:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: azure-fileshare-pv
  labels:
    usage: azure-fileshare-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  azureFile:
    secretName: azure-storage-secret
    shareName: dssparktestfs
    readOnly: false
    secretNamespace: spark-operator

The PVC yml is:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: azure-fileshare-pvc
  # Set this annotation to NOT let Kubernetes automatically create
  # a persistent volume for this volume claim.
  annotations:
    volume.beta.kubernetes.io/storage-class: ""
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  selector:
    # To make sure we match the claim with the exact volume, match the label
    matchLabels:
      usage: azure-fileshare-pv

Let me know if more info is needed.

How exactly did you mount your volume? It your python script (user) allowed to run chmod command on that file? Probably you should first run chown on that resource. — Mikołaj Głodziak, Nov 15 '21 at 14:51
The mounting is taken care of by spark in the spark-submit command. The directory 'counts' itself was created from the script, so I don't understand how it wouldn't already own the directory. I feel like if the chown is the problem, there should be a way to fix it through kubernetes, but I can't quite find it. — Maaverik, Nov 16 '21 at 16:18
Could you check the owner of the resource `/opt/spark/work-dir/output/counts`? (run `ls -l` command) — Mikołaj Głodziak, Nov 16 '21 at 17:09

score 2 · Answer 1 · answered Nov 17 '21 at 07:55

The owner and user are root.

It looks like you've mounted your volume as root. Your problem:

chmod: changing permissions of '/opt/spark/work-dir/output/counts': Operation not permitted

is due to the fact that you are trying to change the permissions of a file that you are not the owner of. So you need to change the owner of the file first.

The easiest solution is to chown on the resource you want to access. However, this is often not feasible as it can lead to privilege escalation as well as the image itself may block this possibility. In this case you can create security context.

A security context defines privilege and access control settings for a Pod or Container. Security context settings include, but are not limited to:

Discretionary Access Control: Permission to access an object, like a file, is based on user ID (UID) and group ID (GID).

Security Enhanced Linux (SELinux): Objects are assigned security labels.

Running as privileged or unprivileged.

Linux Capabilities: Give a process some privileges, but not all the privileges of the root user. >

AppArmor: Use program profiles to restrict the capabilities of individual programs.

Seccomp: Filter a process's system calls.

AllowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the no_new_privs flag gets set on the container process. AllowPrivilegeEscalation is true always when the container is: 1) run as Privileged OR 2) has CAP_SYS_ADMIN.

readOnlyRootFilesystem: Mounts the container's root filesystem as read-only.

The above bullets are not a complete set of security context settings -- please see SecurityContext for a comprehensive list.

For more information about security mechanisms in Linux, see Overview of Linux Kernel Security Features

You can Configure volume permission and ownership change policy for Pods.

By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted. For large volumes, checking and changing ownership and permissions can take a lot of time, slowing Pod startup. You can use the fsGroupChangePolicy field inside a securityContext to control the way that Kubernetes checks and manages ownership and permissions for a volume.

Here is an example:

securityContext:
  runAsUser: 1000
  runAsGroup: 3000
  fsGroup: 2000
  fsGroupChangePolicy: "OnRootMismatch"

chmod error while writing outputs with Spark on Kubernetes

1 Answers1

Linked