20

I create PDF docs in memory as OutputStreams. These should be uploaded to S3. My problem is that it's not possible to create a PutObjectRequest from an OutputStream directly (according to this thread in the AWS dev forum). I use aws-java-sdk-s3 v1.10.8 in a Dropwizard app.

The two workarounds I can see so far are:

  1. Copy the OutputStream to an InputStream and accept that twice the amount of RAM is used.
  2. Pipe the OutputStream to an InputStream and accept the overhead of an extra thread (see this answer)

If i don't find a better solution I'll go with #1, because it looks as if I could afford the extra memory more easily than threads/CPU in my setup.

Is there any other, possibly more efficient way to achive this that I have overlooked so far?

Edit: My OutputStreams are ByteArrayOutputStreams

Community
  • 1
  • 1
EagleBeak
  • 6,939
  • 8
  • 31
  • 47
  • "I create PDF docs in memory as OutputStreams" - ?? an `OutputStream` does not store data (possibly except for `ByteArrayOutputStream`, but then you'd say you created it in memory as a *byte array*) – user253751 Aug 04 '15 at 09:35
  • I use ByteArrayOutputStream. Sorry for the confusion. – EagleBeak Aug 04 '15 at 09:48
  • I have a similar question - http://stackoverflow.com/questions/40268320/how-to-store-object-on-s3-using-outputstream . Were you able to find a solution for this? If not, how did you go about doing #1 in your case? – Omnipresent Oct 26 '16 at 17:23
  • @Omnipresent, you can find what I did in my own answer below. – EagleBeak Oct 28 '16 at 08:44
  • See https://stackoverflow.com/a/64508183/1704634 for a solution which allows you to stream directly to S3 without being forced to store the entire stream in a byte-array. Automatically uses multi-part transfer if the stream gets too large. – blagerweij Oct 23 '20 at 22:53

3 Answers3

11

I solved this by subclassing ConvertibleOutputStream:

public class ConvertibleOutputStream extends ByteArrayOutputStream {
    //Craetes InputStream without actually copying the buffer and using up mem for that.
    public InputStream toInputStream(){
        return new ByteArrayInputStream(buf, 0, count);
    }
}
checklist
  • 12,340
  • 15
  • 58
  • 102
EagleBeak
  • 6,939
  • 8
  • 31
  • 47
  • This needs to be changed to `return new ByteArrayInputStream(buf, 0, count);`, otherwise unallocated data in `buf` may be regarded as actual data in the InputStream. – Alex Hall Sep 28 '15 at 16:02
2

What's the actual type of your OutputStream? Since it's an abstract class, there's no saying where the data actually goes (or if it even goes anywhere).

But let's assume that you're talking about a ByteArrayOutputStream since it at least keeps the data in memory (unlike many many others).

If you create a ByteArrayInputStream out of its buffer, there's no duplicated memory. That's the whole idea of streaming.

Kayaman
  • 72,141
  • 5
  • 83
  • 121
  • 1
    OK, and how would you suggest I should access the buffer? Would you recommend creating a subclass and providing a public getter for the protected field `buf` from the `ByteArrayOutputStream`? – EagleBeak Aug 04 '15 at 09:55
  • Eh, I didn't realize that BAOS makes a copy of the buffer with `toByteArray`. Yeah, you should go for the subclass route. – Kayaman Aug 04 '15 at 09:59
  • Exactly, hence the subclass idea. – EagleBeak Aug 04 '15 at 10:01
  • There's also several libraries that have a similar class already (`ByteArrayBuffer` seems to be a common name for them) which will give an `InputStream` directly. Jackson at least has one. – Kayaman Aug 04 '15 at 10:05
  • Thanks for your input! I added my own answer to make the subclass solution more transparent. – EagleBeak Aug 04 '15 at 12:18
0

another workaround is to use presigned url feature of s3. since presigned url allows you to upload files to s3 with http put or post, it is possible to send your output stream to HttpURLConnection. sample code from amazon

Victor Ma
  • 21
  • 4