How to upload a Java OutputStream to AWS S3

Question

I create PDF docs in memory as OutputStreams. These should be uploaded to S3. My problem is that it's not possible to create a PutObjectRequest from an OutputStream directly (according to this thread in the AWS dev forum). I use aws-java-sdk-s3 v1.10.8 in a Dropwizard app.

The two workarounds I can see so far are:

Copy the OutputStream to an InputStream and accept that twice the amount of RAM is used.
Pipe the OutputStream to an InputStream and accept the overhead of an extra thread (see this answer)

If i don't find a better solution I'll go with #1, because it looks as if I could afford the extra memory more easily than threads/CPU in my setup.

Is there any other, possibly more efficient way to achive this that I have overlooked so far?

Edit: My OutputStreams are ByteArrayOutputStreams

"I create PDF docs in memory as OutputStreams" - ?? an `OutputStream` does not store data (possibly except for `ByteArrayOutputStream`, but then you'd say you created it in memory as a *byte array*) — user253751, Aug 04 '15 at 09:35
I have a similar question - http://stackoverflow.com/questions/40268320/how-to-store-object-on-s3-using-outputstream . Were you able to find a solution for this? If not, how did you go about doing #1 in your case? — Omnipresent, Oct 26 '16 at 17:23
@Omnipresent, you can find what I did in my own answer below. — EagleBeak, Oct 28 '16 at 08:44
See https://stackoverflow.com/a/64508183/1704634 for a solution which allows you to stream directly to S3 without being forced to store the entire stream in a byte-array. Automatically uses multi-part transfer if the stream gets too large. — blagerweij, Oct 23 '20 at 22:53

score 11 · Accepted Answer · edited Dec 04 '15 at 09:50

11

I solved this by subclassing ConvertibleOutputStream:

public class ConvertibleOutputStream extends ByteArrayOutputStream {
    //Craetes InputStream without actually copying the buffer and using up mem for that.
    public InputStream toInputStream(){
        return new ByteArrayInputStream(buf, 0, count);
    }
}

edited Dec 04 '15 at 09:50

checklist

12,340
15
58
102

answered Aug 04 '15 at 12:17

EagleBeak

6,939
8
31
47

This needs to be changed to `return new ByteArrayInputStream(buf, 0, count);`, otherwise unallocated data in `buf` may be regarded as actual data in the InputStream. – Alex Hall Sep 28 '15 at 16:02

score 2 · Answer 2 · answered Aug 04 '15 at 09:35

2

What's the actual type of your OutputStream? Since it's an abstract class, there's no saying where the data actually goes (or if it even goes anywhere).

But let's assume that you're talking about a ByteArrayOutputStream since it at least keeps the data in memory (unlike many many others).

If you create a ByteArrayInputStream out of its buffer, there's no duplicated memory. That's the whole idea of streaming.

answered Aug 04 '15 at 09:35

Kayaman

72,141
5
83
121

1

OK, and how would you suggest I should access the buffer? Would you recommend creating a subclass and providing a public getter for the protected field `buf` from the `ByteArrayOutputStream`? – EagleBeak Aug 04 '15 at 09:55
Eh, I didn't realize that BAOS makes a copy of the buffer with `toByteArray`. Yeah, you should go for the subclass route. – Kayaman Aug 04 '15 at 09:59
Exactly, hence the subclass idea. – EagleBeak Aug 04 '15 at 10:01
There's also several libraries that have a similar class already (`ByteArrayBuffer` seems to be a common name for them) which will give an `InputStream` directly. Jackson at least has one. – Kayaman Aug 04 '15 at 10:05
Thanks for your input! I added my own answer to make the subclass solution more transparent. – EagleBeak Aug 04 '15 at 12:18

score 0 · Answer 3 · answered Sep 19 '21 at 01:57

0

another workaround is to use presigned url feature of s3. since presigned url allows you to upload files to s3 with http put or post, it is possible to send your output stream to HttpURLConnection. sample code from amazon

answered Sep 19 '21 at 01:57

Victor Ma

21
4

How to upload a Java OutputStream to AWS S3

3 Answers3

Linked

Related