I am trying to access a s3://
path with
spark.read.parquet("s3://<path>")
And I get this error
Py4JJavaError: An error occurred while calling o31.parquet. : java.io.IOException: No FileSystem for scheme: s3
However, running the following line
hadoop fs -ls <path>
Does work...
So I guess this might be a configuration issue between hadoop
and spark
How can this be solved ?
EDIT
After reading the suggested answer, I've tried adding the jars hard coded to the spark config, with no success
spark = SparkSession\
.builder.master("spark://" + master + ":7077")\
.appName("myname")\
.config("spark.jars", "/usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.221.jar,/usr/share/aws/aws-java-sdk/hadoop-aws.jar")\
.config("spark.jars.packages", "com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.2")\
.getOrCreate()
No success