1

I am trying to save a model learning to S3 from my Spark Standalone cluster. But I have this error :

java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:68)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:529)
at ALS$.main(ALS.scala:32)
at ALS.main(ALS.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NoClassDefFoundError: com/amazonaws/event/ProgressListener
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
    at java.lang.Class.getConstructor0(Class.java:3075)
    at java.lang.Class.newInstance(Class.java:412)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 23 more
Caused by: java.lang.ClassNotFoundException:com.amazonaws.event.ProgressListener
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 28 more

I have added Hadoop-aws aws-sdk in extraClassPath in spark-defaults.conf

What I have tried so far : I send my spark-submit with a fat jar compiled by sbt assembly (I have also added those dependencies in the sbt). My AWS Credentials are exported in the master environnement

Any idea on where I need to explore to fix this ?

Thanks !

Farah
  • 59
  • 1
  • 3
  • 8

2 Answers2

6

That's an aws class, so you are going to need to make sure your CP has *the exact set of aws-java JARs your hadoop-aws JAR was built against.

mvnrepository lists those dependencies.

I have a project whose whole aim in life is to work out WTF is wrong with blobstore connector bindings, cloudstore. You can use that in spark-shell or real spark queries to help diagnose things.

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
stevel
  • 12,567
  • 1
  • 39
  • 50
  • Thank you I will you try your project. I have added `Hadoop-aws` and `aws-java` in `spark.driver.extraClassPath` is this what you are talking about ? – Farah May 16 '18 at 12:45
  • Your tool showed me an error : `java.lang.NoClassDefFoundError: org/apache/hadoop/fs/s3a/S3AUtils` – Farah May 16 '18 at 14:10
  • S3AUtils? That should be in hadoop-aws JAR. That means the classpath the tool is getting doesn't contain that. For spark, put things in $SPARK_HOME/jars, alongside the other hadoop-* JARs it has – stevel May 18 '18 at 15:15
  • I believe an example of compatible jars is covered in this question/answer: https://stackoverflow.com/questions/58415928/spark-s3-error-java-lang-classnotfoundexception-class-org-apache-hadoop-f – Natalie Olivo Nov 22 '19 at 03:54
  • Natalie -not really, That answer looks at the s3n connector which was already deprecated in spark 2.7 and depends on the jets3t library. the S3A one uses the aws SDK – stevel Dec 17 '19 at 17:08
0

I faced so many problems extending for IAMcredential, etc, I solved this issue by downloading a hadoop version that matches with my spark version, then I copied the hadoop-common jar into spark jar folder. The hadoop-common must be the same version with hadoop-aws jar, in my case hadoop-aws-3.2.2.jar and hadoop-common-3.2.2.jar.