0

This is an old issue and I have solved it by follow the answer in this post: How can I access S3/S3n from a local Hadoop 2.6 installation?

The answer from Kamil Sindi works for me, by adding packages in spark-shell option:

spark-shell --packages com.amazonaws:aws-java-sdk:1.11.967,org.apache.hadoop:hadoop-aws:3.2.0

When I type below command, it works.

scala> sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
scala> sc.textFile("s3://test/testdata.txt").foreach(println)

But when I add jars as below:

spark-shell --jars /tmp/hadoop-aws-3.2.0.jar , /tmp/aws-java-sdk-1.11.967.jar

an error throw as below:

java.lang.NoClassDefFoundError: com/amazonaws/AmazonServiceException
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:348)

Can any one tell me why adding jars doesn't work. How can I solve this issue by adding jars as other answers suggested?

Machi
  • 403
  • 2
  • 14

1 Answers1

0

Solved. please use the bundle jar, which locates in

$HADOOP_HOME/share/hadoop/tools/lib/

spark-shell --jars /data/workspace/files/hadoop-aws-3.2.0.jar,/data/workspace/files/aws-java-sdk-bundle-1.11.563.jar

Machi
  • 403
  • 2
  • 14