Save a pdf in AWS S3 location using PDFBox - Java

Question

I am using PDFBox to create PDF. I want to save the PDF in S3. I am able to create PDF using PDFBox and upload it to the S3 location. I am thinking of saving the PDF directly to S3 using PDDocument.save(S3location)without saving it locally and then uploading to S3. Is there any way for that ?

Aside from a (possibly) slightly cleaner code - what are you aiming to achieve? I would expect the latency to S3 to kill the performance of any incremental write, so it would almost definitely be faster to cache the file locally before uploading it. — Itai, Aug 06 '18 at 07:29

score 0 · Answer 1 · answered Sep 16 '21 at 05:21

Since you do not want to store file locally, you'll need some kind of input stream. To Store object in AWS, you would need InputStream and contentLength, as mentioned in their doc:

RequestBody requestBody = RequestBody.fromInputStream(fileInputStream, fileSize)
PutObjectRequest putOb = PutObjectRequest.builder()
                    .bucket(bucketName)
                    .key(objectKey)
                    .metadata(metadata)
                    .build();
PutObjectResponse response = s3.putObject(putOb, requestBody);

You'll need to do the following:

save the PDDocument to an output stream, doc
Convert the ouptput stream to a input stream using PipedOutput/PipedInput Streams as explained in this Stackoverflow answer
Use this input Stream and content length to upload to S3

//create new ByteArrayOutputStream
ByteArrayOutputStream originalOutputStream = new ByteArrayOutputStream();
//save your PDDocument to that stream
pdDocument.save(originalOutputStream);
//Determine the size of the stream as you will need this to store in S3
long size = originalOutputStream.size();


//convert this output stream to input stream
PipedOutputStream out = new PipedOutputStream();
PipedInputStream in = new PipedInputStream(out);
new Thread(() -> {
    try {
        // write the original OutputStream to the PipedOutputStream
        // note that in order for the below method to work, you need
        // to ensure that the data has finished writing to the
        // ByteArrayOutputStream
        originalOutputStream.writeTo(out);
    } catch (IOException e) {
        log.error(e.toString());
    } finally {
        // close the PipedOutputStream here because we're done writing data
        // once this thread has completed its run
        if (out != null) {
            // close the PipedOutputStream cleanly
            try {
                out.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}).start();

//Use the input stream and size to store the stream in s3
RequestBody requestBody = RequestBody.fromInputStream(in, size);
PutObjectRequest putOb = PutObjectRequest.builder()
                    .bucket(bucketName)
                    .key(objectKey)
                    .metadata(metadata)
                    .build();
PutObjectResponse response = s3.putObject(putOb, requestBody);

Save a pdf in AWS S3 location using PDFBox - Java

1 Answers1