How to read Amazon s3 bucket files from Intellij ide installed on local machine using scala/spark?
Asked
Active
Viewed 2,888 times
2 Answers
1
IntelliJ is not thing important. The important thing is hadoop configuration. You can load DataFrame from S3 if your hadoop configuration has credential variable about aws. You can set variable at core-site.xml or set configuration method of spark.hadoopConfiguration like it.
sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "")
sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","")
Be careful about what s3 connector you are using. There are some of connector such as s3, s3a, s3n. If your connector is s3, set fs.s3.*
, but if your connector is s3n, you should set fs.s3n.*

Juhong Jung
- 101
- 1
- 7
0
Take a look at the following java example: https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html
In scala, you can do something like this:
val accessKey = ???
val secretKey = ???
val awsCredentials: BasicAWSCredentials = new BasicAWSCredentials(accessKey, secretKey)
val s3: AmazonS3 = AmazonS3ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(awsCredentials))
.build()
val bucketName = "myS3bucket"
val keyName = "path/to/file"
val s3Obj = s3.getObject(bucketName, keyName)
val in = s3Obj.getObjectContent
val reader = new BufferedReader(new InputStreamReader(in))
val data = Stream.continually(reader.read()).takeWhile(_ != -1).map(_.toChar).mkString

emran
- 902
- 6
- 12
-
Sorry I just noticed you mentioned reading in apache spark. This is how you do it without Spark. – emran Oct 21 '18 at 15:57