2

I am using piped output streams to convert OutputStream to InputStream because the AWS java sdk does not allow puting objects on S3 using OutputStreams

I'm using the code below, however, this will intermittently just hang. This code is in a web application. Currently there is no load on the application...I am just trying it out on my personal computer.

ByteArrayOutputStream os = new ByteArrayOutputStream();
PipedInputStream inpipe = new PipedInputStream();
final PipedOutputStream out = new PipedOutputStream(inpipe);
try {
   String xmpXml = "<dc:description>somedesc</dc:description>"
   JpegXmpRewriter rewriter = new JpegXmpRewriter();
   rewriter.updateXmpXml(isNew1,os, xmpXml); 
      new Thread(new Runnable() {
          public void run () {
              try {
                  // write the original OutputStream to the PipedOutputStream
                  println "starting writeto"
                  os.writeTo(out);
                  out.close();
                  println "ending writeto"
              } catch (IOException e) {
                  System.out.println("Some exception)
              }
          }
      }).start();
      ObjectMetadata metadata1 = new ObjectMetadata();
      metadata1.setContentLength(os.size());
      client.putObject(new PutObjectRequest("test-bucket", "167_sample.jpg", inpipe, metadata1));
    }
 catch (Exception e) { 
      System.out.println("Some exception")
 }
 finally {
    isNew1.close()
    os.close()
 }
Omnipresent
  • 29,434
  • 47
  • 142
  • 186
  • For reference, this technique (which avoids having to copy the contents of the BAOS yet again) is discussed here: http://stackoverflow.com/a/23874232/14955 – Thilo Oct 27 '16 at 14:14
  • do you have a thread dump (kill -QUIT)? – Thilo Oct 27 '16 at 14:18
  • I know that solution. That is where I got my snipper from and that is exactly what I am doing in my question, aren't I? – Omnipresent Oct 27 '16 at 14:19
  • @Thilo No. I do not have a thread dump. If it would help, how would I get it? – Omnipresent Oct 27 '16 at 14:19
  • I am running this code in a web application. I run the application with `grails run-app` when it hangs I am just forced to hit `ctrl+c`. Are you suggesting killing the process that runs `grails run-app` with `kill -QUIT` from terminal? – Omnipresent Oct 27 '16 at 14:22
  • If the data is not too huge (you already have it in memory once after all), consider using `toByteArray` and give that to AWS. Or a temporary file. – Thilo Oct 27 '16 at 14:23
  • I will try `toByteArray`. The whole point of me doing this is to avoid writing a file to disk. – Omnipresent Oct 27 '16 at 14:24
  • 1
    Doesn't your usage of BAOS defeat the advantages of piped streams? What you should in fact do is call `rewriter.updateXmpXml(isNew1,out, xmpXml);` from the thread you created. Note that I wrote `out`, not `os`. – Marko Topolnik Oct 27 '16 at 14:26
  • @MarkoTopolnik that gives me a `pipe closed` error. Also, in the answer @Thilo referenced it says to use BAOS – Omnipresent Oct 27 '16 at 14:45
  • Moving the `rewriter` into the Runnable as the direct source for the pipe sounds like a good idea. Avoids reading the whole thing into memory. OTOH, error handling becomes more difficult (you'll already have started the Amazon upload before knowing if the XML bit worked). Also, you don't know the result size anymore (maybe Amazon can figure it out, but that probably involves some more buffering on their end). – Thilo Oct 27 '16 at 14:49
  • That matches neither my own experience nor the specification. It must work if done right. – Marko Topolnik Oct 27 '16 at 14:50
  • @Thilo FWIW I found this blog post http://tech.coterie.com/2012/10/streaming-schema-based-xml-to-s3-with.html which led me to use the grails-executor plugin. Modifying the code according to that I am not noticing the hang in the application. I have tried 5 times. I will try more. I've asked the plugin maintainers what the difference is between using thread and their code https://github.com/basejump/grails-executor/issues/15 – Omnipresent Oct 27 '16 at 14:59
  • Still working after 15 tries. I will keep the question open, however, since there isn't a solution and I would like to really understand why the executor approach works but thread run approach does not. – Omnipresent Oct 27 '16 at 15:07
  • If you want to be sure your thread didn't fail, you should catch Throwable, not just IOException. – Marko Topolnik Oct 27 '16 at 15:19
  • 1
    I agree with @Thilo that collecting into the BAOS has the advantage of known size and knows success before upload start. However, once you have the data in RAM, there is really no point anymore to get involved with the complexity of starting another thread and passing data from thread to thread, just to solve a silly API limitation. – Marko Topolnik Oct 27 '16 at 15:41

1 Answers1

2

Instead of bothering with the complexities of starting another thread, instantiating two concurrent classes, and then passing data from thread to thread, all to solve nothing but a minor limitation in the provided JDK API, you should just create a simple specialization of the ByteArrayOutputStream:

class BetterByteArrayOutputStream extends ByteArrayOutputStream {
    public ByteArrayInputStream toInputStream() {
        return new ByteArrayInputStream(buf, 0, count);
    }
}

This converts it to an input stream with no copying.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • This looks promising. Can you please explain it a bit more? What is `buf` in the return statement? How would I call this after `rewriter.updateXmpXml(isNew1,os, xmpXml);` ? – Omnipresent Oct 27 '16 at 16:20
  • [Javadoc on `buf`](https://docs.oracle.com/javase/7/docs/api/java/io/ByteArrayOutputStream.html#buf). `client.putObject(new PutObjectRequest("test-bucket", "167_sample.jpg", os.toInputStream(), metadata1));` – Marko Topolnik Oct 27 '16 at 16:22
  • This seems to be working. Would this work well on large files? Is this keeping all data in memory? – Omnipresent Oct 27 '16 at 16:34
  • 1
    That's a strange question given that your initial solution already keeps everything in RAM. This just reuses the same data. – Marko Topolnik Oct 27 '16 at 16:38
  • Right. I forgot about that. Using pipedstreams and gave an illusion that I was not. If this doesn't work well for large files then I willl go back to pipedstreams but remove BAOS as you recommended in your comment. For some reason that was not working for me. I'll post another question about it. Thanks – Omnipresent Oct 27 '16 at 16:58