0

I have enabled checkpointing for my sparkstreaming application using the getOrCreate method. The checkpoint directory points to an S3 bucket. The problem i have is a credential issue in accessing S3 :

Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

I have already set the environment variables (AWS_SECRET_KEY and AWS_ACCESS_KEY). Also my fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey have been specified in the application.conf.. So i dont know why it still fails.

The ML Guy
  • 16
  • 1

1 Answers1

1

The environment variables (AWS_SECRET_KEY and AWS_ACCESS_KEY) no longer works after Spark 1.3.

Please refer to for the new approach:
How to read input from S3 in a Spark Streaming EC2 cluster application

val conf = new SparkConf().setAppName("Simple Application").setMaster("local")      
val sc = new SparkContext(conf)
val hadoopConf=sc.hadoopConfiguration;
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId",myAccessKey)
hadoopConf.set("fs.s3.awsSecretAccessKey",mySecretKey)
Community
  • 1
  • 1
Shawn Guo
  • 3,169
  • 3
  • 21
  • 28