IOUtils.copy() with input and output streams is extremely slow

Question

As part of my web service, I have a picture repository which retrieves an image from Amazon S3 (a datastore) then returns it. This is how the method that does this looks:

File getPicture(String path) throws IOException {
    File file = File.createTempFile(path, ".png");
    S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, path));
    IOUtils.copy(object.getObjectContent(), new FileOutputStream(file));
    return file;
}

The problem is that it takes way too long to get a response from the service - (a 3MB image took 7.5 seconds to download). I notice that if I comment out the IOUtils.copy() line, the response time is significantly faster so it must be that particular method that's causing this delay.

I've seen this method used in almost all modern examples of converting an S3Object to a file but I seem to be a unique case. Am I missing a trick here?

Appreciate any help!

I can’t helo with your actual problem, but you might want to wrap the streams in try-with-resource statements. As per the IOUtils docs, it doesn’t close the Streams you pass in - https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/IOUtils.html - so who is responsible for closing the FileOutputStream? — Jakg, Nov 23 '18 at 00:35
Thanks - I've already tried closing the output stream but it didn't help — eyes enberg, Nov 23 '18 at 00:40
*so it must be that particular method that's causing this delay* or the fact that yo already downloaded the file before made the second download faster, due to proxies and caches. — JB Nizet, Nov 23 '18 at 00:46
I downloaded the file multiple times for each scenario (as well as multiple files). — eyes enberg, Nov 23 '18 at 01:27
Please indicate **which** IOUtils you are using. At the very least, include the package name. There are dozens of libraries with a class called `IOUtils` - for example, the AWS Java SDK, but also the popular apache-commons library. — Erwin Bolwidt, Nov 23 '18 at 02:20

score 2 · Answer 1 · answered Nov 23 '18 at 00:52

From the AWS documentation:

public S3Object getObject(GetObjectRequest getObjectRequest)

the returned Amazon S3 object contains a direct stream of data from the HTTP connection. The underlying HTTP connection cannot be reused until the user finishes reading the data and closes the stream.

public S3ObjectInputStream getObjectContent()

Note: The method is a simple getter and does not actually create a stream. If you retrieve an S3Object, you should close this input stream as soon as possible, because the object contents aren't buffered in memory and stream directly from Amazon S3.

If you remove the IOUtils.copy line, then method exits quickly because you don't actually process the stream. If the file is large it will take time to download. You can't do much about that unless you can get a better connection to the AWS services.

That's so strange, the response time was fine for months until a few days ago - and I haven't touched this code for a very long time. — eyes enberg, Nov 23 '18 at 01:28
@eyesenberg There's a lot of external factors that can be slowing this down. Network speed, disk/storage R/W speed, factors on AWS side, etc. Maybe try reading all of the S3 object into just a byte array. That will at least tell you if the issue lies within the `IOUtils` / disk speeds. — flakes, Nov 23 '18 at 01:31

IOUtils.copy() with input and output streams is extremely slow

1 Answers1