I am getting the following error:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.io.IOException: No FileSystem for scheme: s3n ...
When I try to retrieve data from S3. My spark-defaults.conf
has the following line:
spark.jars /Users/lrezende/Desktop/hadoop-aws-2.9.0.jar
And this file is in my Desktop.
My code is:
from pyspark.sql import SparkSession
if spark:
spark.stop()
spark = SparkSession\
.builder\
.master("<master-address>")\
.appName("Test")\
.getOrCreate()
spark.sparkContext.setLogLevel('ERROR')
lines = spark.sparkContext.textFile("s3n://bucket/something/2017/*")
lines.collect()
When I run de lines.collect()
I get the error.
Could someone help me to fix it?