2

I have a below tools

  1. Spark 2.4.3
  2. Scala 2.11.12
  3. OS : Windows 10

This is my sbt code to import the libraries

    libraryDependencies ++= Seq(        
        "javassist" % "javassist" % "3.12.1.GA" ,
        "com.typesafe" % "config" % "1.3.4",
        "org.apache.spark" %% "spark-core" % sparkVersion,      
        "org.apache.spark" %% "spark-sql" % sparkVersion ,
        "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
        "com.twitter" % "jsr166e" % "1.1.0",  
        "com.amazonaws" % "aws-java-sdk" % "1.11.592"
        "org.apache.hadoop" % "hadoop-aws" % "2.7.3",
        "org.apache.spark" %% "spark-catalyst" % sparkVersion       
    )

My scala code is as below

            val rdd = sparkSession.sparkContext.parallelize(
                                      Seq(
                                        ("first", Array(2.0, 1.0, 2.1, 5.4)),
                                        ("test", Array(1.5, 0.5, 0.9, 3.7)),
                                        ("choose", Array(8.0, 2.9, 9.1, 2.5))
                                      )
                                    )
            val dfWithoutSchema = sparkSession.createDataFrame(rdd)
            sparkSession.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "XXXXXX")
            sparkSession.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "XXXXXXX")
            sparkSession.sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

            dfWithoutSchema.write
            .mode("overwrite")
            .parquet("s3a://test-daily-extracts/sample2")

when i compile through SBT i am getting no errors. But when I run the code I am getting the error as

   java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider

and my stack trace is as below

    at java.lang.Class.forName0(Native Method)
            at java.lang.Class.forName(Class.java:348)
            at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
            at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
            at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
            at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:113)
            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:88)
            at org.apache.parquet.hadoop.ParquetOutputCommitter.<init>(ParquetOutputCommitter.java:43)
            at org.apache.parquet.hadoop.ParquetOutputFormat.getOutputCommitter(ParquetOutputFormat.java:442)
            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
            at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217)
            at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
            at org.apache.spark.scheduler.Task.run(Task.scala:121)
            at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.lang.ClassNotFoundException: com.amazonaws.auth.AWSCredentialsProvider
            at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
            ... 30 more

Thanks in advance for any help.

EDIT:2019-07-17

I updated my SBT code to below.

    libraryDependencies ++= Seq(        
        "javassist" % "javassist" % "3.12.1.GA" ,
        "com.typesafe" % "config" % "1.3.4",
        "org.apache.spark" %% "spark-core" % sparkVersion,      
        "org.apache.spark" %% "spark-sql" % sparkVersion ,
        "com.datastax.spark" %% "spark-cassandra-connector" % "2.4.1",
        "com.twitter" % "jsr166e" % "1.1.0", 
        "com.amazonaws" % "aws-java-sdk" % "1.7.4", 
        "net.java.dev.jets3t" % "jets3t" % "0.9.4",
        "org.apache.hadoop" % "hadoop-aws" % "2.7.3",
        "org.apache.hadoop" % "hadoop-client" % "2.7.3",
        "org.apache.hadoop" % "hadoop-hdfs" % "2.7.3",
        "org.apache.spark" %% "spark-catalyst" % sparkVersion       
    )

added the below code to driver program.

    val  urls = urlsinclasspath(getClass.getClassLoader).foreach(println)


    def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
        case null => Array()
        case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
        case _ => urlsinclasspath(cl.getParent)
      }

I am able to see the aws-java-sdk-1.7.4 is loading at the run time now and it has AWSCredentialsProvider class in it. But still I am getting the below error. My complete tace is below

    19/07/17 17:02:25 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, XX.XX.XX.XX, executor 0): java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider
                            at java.lang.Class.forName0(Native Method)
                            at java.lang.Class.forName(Class.java:348)
                            at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
                            at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
                            at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
                            at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
                            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
                            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
                            at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
                            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
                            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
                            at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
                            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:113)
                            at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.<init>(FileOutputCommitter.java:88)
                            at org.apache.parquet.hadoop.ParquetOutputCommitter.<init>(ParquetOutputCommitter.java:43)
                            at org.apache.parquet.hadoop.ParquetOutputFormat.getOutputCommitter(ParquetOutputFormat.java:442)
                            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
                            at org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol.setupCommitter(SQLHadoopMapReduceCommitProtocol.scala:40)
                            at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupTask(HadoopMapReduceCommitProtocol.scala:217)
                            at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:229)
                            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
                            at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
                            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
                            at org.apache.spark.scheduler.Task.run(Task.scala:121)
                            at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
                            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
                            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
                            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                            at java.lang.Thread.run(Thread.java:748)
                    Caused by: java.lang.ClassNotFoundException: com.amazonaws.auth.AWSCredentialsProvider
                            at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
                            at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
                            at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
                            at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
                            ... 30 more
Chakradhar
  • 753
  • 6
  • 14
  • 29

2 Answers2

1

This dependency has com/amazonaws/auth/AWSCredentialsProvider class which you are missing.

libraryDependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.592"

I would suggest you to go with uber jar i.e. with SBT package all jars with dependencies as one jar so that nothing is missed or left out.

How to make uber jar here

Also add this code to your driver ... understand what jars are coming in to your classpath.

val  urls = urlsinclasspath(getClass.getClassLoader).foreach(println)


def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
    case null => Array()
    case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
    case _ => urlsinclasspath(cl.getParent)
  }
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
0

after lot of research on google I found that there was no error in my code, even though all jars are loading, My spark installation was missing hadoop.dll in C:\winutils\bin & C:\Windows\System32 I downloaded hadoop.dll from this link https://github.com/steveloughran/winutils/tree/master/hadoop-2.7.1/bin and placed in both directories. It worked very fine. I am not sure why error was misleading.

Thanks all for your help.

Chakradhar
  • 753
  • 6
  • 14
  • 29