0

I am using AWS S3 client to download big files from S3 (something around ~600MB). But in the midst of download, download fails with errors like Socket closed or Premature end of Content-Length delimited message body or Data received in non-data state: 6. Error message keeps changing from one failure to other. On little bit research, it seems that such issues comes, when AmazonS3 client gets garbage collected before inputstream is completely read and written. https://forums.aws.amazon.com/thread.jspa?messageID=438171#

Here is what code looks like

public void retrieve(String bucket, String key, String localFile){
    AmazonS3 s3Client = createNewS3Client();
    S3Object object = s3Client.getObject(bucket, key);

    InputStream inputStream = object.getObjectContent();
    OutputStream outputStream = new FileOutputStream(localFile);

    //read bytes from inputstream and write to outputstream until EOF
    writeBytes(inputStream, outputStream); 

    inputStream.close();
    outputStream.close();
}

So my question is - can s3Client in above method be garbage collected if method writeBytes takes longer time to finish and before it can complete and return? There are no reference to s3Client in writeBytes method.

RandomQuestion
  • 6,778
  • 17
  • 61
  • 97

2 Answers2

0

No, it cannot. You have a reachable reference to it on the stack.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • That's what I thought but I can't seem to find any other possible explanation for method throwing exception. – RandomQuestion Mar 26 '14 at 21:48
  • 1
    I wouldn't be so sure... the code could have been optimized, as there's no reference to s3Client after you start consuming the stream, I believe there's no reason why it couldn't be gc'ed. Just add a line at the end of the method like `s3Client.toString()` to force the JVM to keep the ref there and see if that helps. – Renato Mar 26 '14 at 21:50
  • @Renato The JVM specification states that GC algorithms are up to the implementation, so _maybe_. But I doubt it. In implementations I've used, as long as an object is referenced from the methods' (that's on the stack) local variable table (even if that variable is out of scope), then the referenced object will no be GC'ed. – Sotirios Delimanolis Mar 26 '14 at 21:59
  • I don't understand why you doubt it. – Renato Mar 26 '14 at 22:07
  • @Renato See [here](http://stackoverflow.com/questions/21437699/outofmemoryerror-when-seemingly-unrelated-code-block-commented-out) for an example. Even though the object and the variable referencing it are out of scope, the object is not GC'ed. Now the `LocalVariableTable` is part of the JVM specification, but the GC is not. I'm saying that JVM implementation (HotSpot) did not GC the object. – Sotirios Delimanolis Mar 26 '14 at 22:09
  • That's very interesting... but seems to support the possibility that s3Client above may be GC'ed. Hope that @Jitendra will be able to tell if keeping a forced ref to s3Client fixed the issue. – Renato Mar 26 '14 at 22:23
  • It didn't fix the issue. :( Still receiving same error. – RandomQuestion Mar 27 '14 at 00:16
  • Does `AmazonS3` have a `finalize()` method? Can you put a breakpoint in it and debug to see if it gets called? Can this simply be a network error? – Sotirios Delimanolis Mar 27 '14 at 00:24
  • When I run it locally on my desktop, it seems to be working fine. I see this issue in production. So debugging is going to be challenging. I have to see about `AmazonS3` finalize method. – RandomQuestion Mar 27 '14 at 00:30
  • There might be possibility of network error but I am getting this error consistently. – RandomQuestion Mar 27 '14 at 00:32
  • I am being paranoid but just adding `log.debug("s3client " + s3Client);` at the end of method should also keep the reference and not having to call some method on `s3Client`? – RandomQuestion Mar 27 '14 at 00:46
  • @Jitendra I don't think you need it. According to Renato you would, but the second that call is done, according to them, the object would again be eligible for GC, which doesn't really help anyway. – Sotirios Delimanolis Mar 27 '14 at 00:47
  • @SotiriosDelimanolis: The GC is one of the very last things which could cause an exception (by definition it can collect only garbage and what can be used later is again by definition no garbage). Why don't you simply show us the exception? – maaartinus Mar 27 '14 at 01:31
  • In the Amazon forum you linked, everyone seems to have resolved this issue by making s3Client a field in their class (which rules out it being GC'ed while your class is in scope). Have you done that? – Renato Mar 27 '14 at 16:59
0

Adding my answer here which actually solved the problem.

What didn't work: To avoid garbage collection, I had added a log statement in the end which used s3Client object.

As suggested by people in comments, making s3Client a field in class was one option but I was not in a position to make that change. That might have fixed the issue though.

What fixed the issue: Using API getObject(GetObjectRequest getObjectRequest, File destinationFile). It takes care of stream handling, reading the content from stream and writing to a file.

RandomQuestion
  • 6,778
  • 17
  • 61
  • 97