0

All I am trying to connect to S3 environment from a spark installed in local mac machine and using the following commands

./bin/spark-shell --packages com.amazonaws:aws-java-sdk-pom:1.11.271,org.apache.hadoop:hadoop-aws:3.1.1,org.apache.hadoop:hadoop-hdfs:2.7.1

This connects to scala and downloads all the libraries

Then I execute the following commands in spark shell

val accessKeyId = System.getenv("AWS_ACCESS_KEY_ID") 

val secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY")    
val hadoopConf=sc.hadoopConfigurationhadoopConf.set("fs.s3.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")

hadoopConf.set("fs.s3.awsAccessKeyId", accessKeyId)

hadoopConf.set("fs.s3.awsSecretAccessKey", secretAccessKey)

hadoopConf.set("fs.s3n.awsAccessKeyId", accessKeyId)

hadoopConf.set("fs.s3n.awsSecretAccessKey", secretAccessKey)

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val df =  sqlContext.read.json("s3a://path/1551467354353.c948f177e1fb.dev.0fd8f5fd-22d4-4523-b6bc-b68c181b4906.gz")

But I get NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities when I use S3a or S3

Any idea what I could be missing here ?

ech0
  • 512
  • 4
  • 14
Peter2711
  • 839
  • 9
  • 9
  • https://stackoverflow.com/questions/52310416/noclassdeffounderror-org-apache-hadoop-fs-streamcapabilities-while-reading-s3-d – kichik Mar 12 '19 at 22:47
  • Possible duplicate of [NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities while reading s3 Data with spark](https://stackoverflow.com/questions/52310416/noclassdeffounderror-org-apache-hadoop-fs-streamcapabilities-while-reading-s3-d) – stevel Mar 13 '19 at 11:28

0 Answers0