Read S3 bucket files from Intellij using scala/spark

Question

How to read Amazon s3 bucket files from Intellij ide installed on local machine using scala/spark?

score 1 · Answer 1 · answered Oct 21 '18 at 16:04

IntelliJ is not thing important. The important thing is hadoop configuration. You can load DataFrame from S3 if your hadoop configuration has credential variable about aws. You can set variable at core-site.xml or set configuration method of spark.hadoopConfiguration like it.

sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "")
sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","")

Be careful about what s3 connector you are using. There are some of connector such as s3, s3a, s3n. If your connector is s3, set fs.s3.*, but if your connector is s3n, you should set fs.s3n.*

score 0 · Answer 2 · answered Oct 21 '18 at 15:55

Take a look at the following java example: https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html

In scala, you can do something like this:

val accessKey = ???
val secretKey = ???
val awsCredentials: BasicAWSCredentials = new BasicAWSCredentials(accessKey, secretKey)

val s3: AmazonS3 = AmazonS3ClientBuilder.standard()
    .withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
    .build()

val bucketName = "myS3bucket"
val keyName = "path/to/file"

val s3Obj = s3.getObject(bucketName, keyName)

val in = s3Obj.getObjectContent
val reader = new BufferedReader(new InputStreamReader(in))

val data = Stream.continually(reader.read()).takeWhile(_ != -1).map(_.toChar).mkString

Sorry I just noticed you mentioned reading in apache spark. This is how you do it without Spark. — emran, Oct 21 '18 at 15:57

Read S3 bucket files from Intellij using scala/spark

2 Answers2