I'm trying use Pyspark from AWS EMR to read Excel file it resides s3,In order to do this I have downloaded spark-excel jars spark-excel_2.11-0.12.4.jar and spark-excel_2.12-0.13.5.jar and places into s3 bucket
scenario 1:
===========
df = spark.read.format("com.crealytics.spark.excel").option("useHeader", "true").option("inferschema", "true").load("s3://bucket/abc.xlsx")
spark-submit --jars s3://Bucket/spark-excel_2.11-0.12.4.jar test.py
Error:
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/collections4/IteratorUtils
scenario2:
=========
df = spark.read.format("com.crealytics.spark.excel").option("header", "true").option("inferschema", "true").load("s3://bucket/abc.xlsx")
spark-submit --jars s3://Bucket/spark-excel_2.12-0.13.5.jar test.py
Error:
py4j.protocol.Py4JJavaError: An error occurred while calling o79.load.
: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)
Can some one please assist me to fix this issue ? I appreciate your help !