I'm running Jupyterhub on EKS and wants to leverage EKS IRSA functionalities to run Spark workloads on K8s. I had prior experience of using Kube2IAM, however now I'm planning to move to IRSA.
This error is not because of IRSA, as service accounts are getting attached perfectly fine to Driver and Executor pods and I can access S3 via CLI and SDK from both. This issue is related to accessing S3 using Spark on Spark 3.0/ Hadoop 3.2
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.NoClassDefFoundError: com/amazonaws/services/s3/model/MultiObjectDeleteException
I'm using following versions -
- APACHE_SPARK_VERSION=3.0.1
- HADOOP_VERSION=3.2
- aws-java-sdk-1.11.890
- hadoop-aws-3.2.0
- Python 3.7.3
I tested with different version as well.
- aws-java-sdk-1.11.563.jar
Please help to give a solution if someone has come across this issue.
PS: This is not an IAM Policy error as well, because IAM policies are perfectly fine.