12

I recently had to upgrade to aws-java-sdk 1.11.108. I have a java program that downloads s3 objects (8 to 10 GB in size) to EC2 box and process it as a stream. This program has had been working for over 2 years w/o any problems, but after updating to the latest version of aws-java-sdk my file-download aborts midway with the following WARN message in the logs (no exception)

WARN:com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

S3Object s3Obj = s3client.getObject(new GetObjectRequest(bucketName, s3FileName));
Reader reader = new BufferedReader(new InputStreamReader(new  GZIPInputStream(s3Obj.getObjectContent());

I will appreciate if somebody can tell why the stream is aborting silently w/o throwing any exception, and what is the best way to make it work.

Thanks

ares
  • 4,283
  • 6
  • 32
  • 63
dee
  • 121
  • 1
  • 4

2 Answers2

1

Be sure to close() the input stream only once.

disco crazy
  • 31,313
  • 12
  • 80
  • 83
Roland Ettinger
  • 2,615
  • 3
  • 23
  • 24
  • It looks like S3ObjectInputStreams somehow automatically close themselves at EOF???? So doing an explicit close() after the EOF causes this error message. – user998303 Dec 12 '19 at 09:08
0

ZIP archives have a (redundant) central directory structure at the end, so you can list the contents of the archive without scanning through the whole thing. Java's ZipInputStream never actually consumes this from the underlying stream; getNextEntry() returns null as soon as it finds the start of the central directory. You might try adding while (in.read() >= 0); in your overridden close method to read through to the end of the underlying stream?

https://github.com/aws/aws-sdk-java/issues/1111

Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219