1

The problem is:

I have spark application , which can't write data to s3. Reading is fine.

Spark configuration:

        SparkConf conf = new SparkConf();
        ...
        conf.set("spark.hadoop.fs.s3a.endpoint", getCredentialConfig().getS3Endpoint());
        System.setProperty("com.amazonaws.services.s3.enableV4", "true");// local works. enable aws v4 auth.

        conf.set("spark.hadoop.fs.s3a.impl", org.apache.hadoop.fs.s3a.S3AFileSystem.class.getName());
        conf.set("spark.hadoop.fs.s3a.access.key", getCredentialConfig().getS3Key());
        conf.set("spark.hadoop.fs.s3a.secret.key", getCredentialConfig().getS3Secret());


        conf.set("spark.hadoop.fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
        conf.set("spark.hadoop.fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
        ...

Write construction is:

String fileName = "s3a://" + getCredentialConfig().getS3Bucket() + "/s3-outputs/test/";
getSparkSession()
     .createDataset(list, Encoders.INT())
     .write()
     .format("com.databricks.spark.csv")
     .mode("overwrite")
     .csv(fileName);

Exception is:

10:35:01.914 [main] DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem - Not Found: s3a://mybucket/s3-outputs/test/_temporary-39c4ebc3-61bd-47e0-9ac6-d047af1965f3
10:35:01.914 [main] DEBUG org.apache.hadoop.fs.s3a.S3AFileSystem - Couldn't delete s3a://mybucket/s3-outputs/test/_temporary-39c4ebc3-61bd-47e0-9ac6-d047af1965f3 - does not exist

It means, that spark can't find template folders on destination file system.

Current hadoop version: 2.7.3

Java 8

On Hadoop 2.8.1 - all works fine. But AWS EMR doesn't support hadoop 2.8.* version for this moment.

yazabara
  • 1,253
  • 4
  • 21
  • 39
  • Any specific reason to use "s3a" file system instead of just "s3"? – Shekhar Sep 15 '17 at 08:36
  • 1
    The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256. Frankfurt doesn't support s3. – yazabara Sep 15 '17 at 08:47
  • EMR doesn't support s3:// at all; use their s3:// and look at the EMR docs as to how to switch to v4 auth, explicitly declaring the frankfurt endpoint as part of this – stevel Sep 18 '17 at 12:22

0 Answers0