How do I read the content of a file in Amazon S3

Question

I have a file in Amazon S3 in bucket ABCD. I have 3 objects ("folderA/folderB/folderC/abcd.csv") which are folders and in the final folder I have a .csv file (abcd.csv). I have used a logic to convert it to JSON and load it back into another file which is a .txt file in the same folder ("folderA/folderB/folderC/abcd.txt"). I had to download the file locally in order to do that. How would I read the file directly and write it back to the text file. The code which I have used to write to a file in S3 is below and I need to read a file from S3.

 InputStream inputStream = new ByteArrayInputStream(json.getBytes(StandardCharsets.UTF_16));
 ObjectMetadata metadata = new ObjectMetadata();
 metadata.setContentLength(json.length());
 PutObjectRequest request = new PutObjectRequest(bucketPut, filePut, inputStream, metadata);
 s3.putObject(request);

You seem a bit confused about S3 objects: you can have an object with a key like `"folderA/folderB/folderC/abcd.csv"` but that's just *one* object. S3 objects are always files, not folders. (Even though for example in S3 web UI `folderA` etc do show up as folders if you've created objects with such keys.) — Jonik, Dec 24 '15 at 10:44
This is a duplicate of [How to write an S3 object to a file?](http://stackoverflow.com/questions/7679924/how-to-write-an-s3-object-to-a-file) — Jonik, Dec 24 '15 at 10:46

score 12 · Accepted Answer · answered Dec 06 '14 at 10:39

First you should get the object InputStream to do your need.

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key));
InputStream objectData = object.getObjectContent();

Pass the InputStream, File Name and the path to the below method to download your stream.

public void saveFile(String fileName, String path, InputStream objectData) throws Exception {
    DataOutputStream dos = null;
    OutputStream out = null;
    try {
        File newDirectory = new File(path);
        if (!newDirectory.exists()) {
            newDirectory.mkdirs();
        }

        File uploadedFile = new File(path, uploadFileName);
        out = new FileOutputStream(uploadedFile);
        byte[] fileAsBytes = new byte[inputStream.available()];
        inputStream.read(fileAsBytes);

        dos = new DataOutputStream(out);
        dos.write(fileAsBytes);
    } catch (IOException io) {
        io.printStackTrace();
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        try {
            if (out != null) {
                out.close();
            }
            if (dos != null) {
                dos.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

After you Download your object read the file and make it to JSON and write it to .txt file after that you can upload the txt file to the desired bucket in S3

I actually used this to read but could not change this to JSON directly and load this to .txt file in s3. Thanks for the answer. This helps — ZZzzZZzz, Dec 06 '14 at 18:53
The IO code in `saveFile()` is way more complicated than necessary. As shown in [this answer](http://stackoverflow.com/a/34445210/56285), basically `Files.copy(objectData, new File("/my/path/file.jpg").toPath());` suffices. — Jonik, Dec 24 '15 at 10:56
Using `available()` is not recommended, in fact it is quite discouraged. See the javadoc for an explanation: [link](https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()). Basically you can't count on an implementation returning the total number of bytes in the stream. — Cassio Pereira, Oct 24 '17 at 12:50

score 0 · Answer 2 · answered Sep 20 '18 at 12:47

You can use other java libs for downloading or reading files without downloading. Check the code please, I hope it is helpful for you. This example for PDF.

import java.io.IOException;
import java.io.InputStream;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.List;
import javax.swing.JTextArea;
import java.io.FileWriter;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.text.PDFTextStripperByArea;
import org.joda.time.DateTime;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.AmazonS3Exception;
import com.amazonaws.services.s3.model.CopyObjectRequest;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.ListObjectsV2Request;
import com.amazonaws.services.s3.model.ListObjectsV2Result;
import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.model.S3ObjectSummary;
import java.io.File; 
   //..
   // in your main class 
   private static AWSCredentials credentials = null;
   private static AmazonS3 amazonS3Client = null;

   public static void intializeAmazonObjects() {
        credentials = new BasicAWSCredentials(ACCESS_KEY, SECRET_ACCESS_KEY);
        amazonS3Client = new AmazonS3Client(credentials);
    }
   public void mainMethod() throws IOException, AmazonS3Exception{
        // connect to aws
        intializeAmazonObjects();

    ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
    ListObjectsV2Result listObjectsResult;
do {

        listObjectsResult = amazonS3Client.listObjectsV2(req);
        int count = 0;
        for (S3ObjectSummary objectSummary : listObjectsResult.getObjectSummaries()) {
            System.out.printf(" - %s (size: %d)\n", objectSummary.getKey(), objectSummary.getSize());

            // Date lastModifiedDate = objectSummary.getLastModified();

            // String bucket = objectSummary.getBucketName();
            String key = objectSummary.getKey();
            String newKey = "";
            String newBucket = "";
            String resultText = "";

            // only try to read pdf files
            if (!key.contains(".pdf")) {
                continue;
            }

            // Read the source file as text
            String pdfFileInText = readAwsFile(objectSummary.getBucketName(), objectSummary.getKey());
            if (pdfFileInText.isEmpty())
                continue;
        }//end of current bulk

        // If there are more than maxKeys(in this case 999 default) keys in the bucket,
        // get a continuation token
        // and list the next objects.
        String token = listObjectsResult.getNextContinuationToken();
        System.out.println("Next Continuation Token: " + token);
        req.setContinuationToken(token);
    } while (listObjectsResult.isTruncated());
}

public String readAwsFile(String bucketName, String keyName) {
    S3Object object;
    String pdfFileInText = "";
    try {

        // AmazonS3 s3client = getAmazonS3ClientObject();
        object = amazonS3Client.getObject(new GetObjectRequest(bucketName, keyName));
        InputStream objectData = object.getObjectContent();

        PDDocument document = PDDocument.load(objectData);
        document.getClass();

        if (!document.isEncrypted()) {

            PDFTextStripperByArea stripper = new PDFTextStripperByArea();
            stripper.setSortByPosition(true);

            PDFTextStripper tStripper = new PDFTextStripper();

            pdfFileInText = tStripper.getText(document);

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
    return pdfFileInText;
}

How do I read the content of a file in Amazon S3

2 Answers2