I need to download potentially large files from a server. In order to avoid potential OutOfMemoryExceptions, I want to use InputStreams to prevent putting the entire file in memory at once.
I decided to run a (crude) benchmark test to determine if the method given from the following SO answer would make the memory footprint small:
https://stackoverflow.com/a/38664475/2434579
Using this example, I wrote this code to see how much memory is being used at any given time:
public void memoryTest(){
// 20MB file: "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
// Small file: "https://upload.wikimedia.org/wikipedia/en/7/7e/Patrick_Star.png"
RequestCallback requestCallback = request -> request.getHeaders()
.setAccept(Arrays.asList(MediaType.APPLICATION_OCTET_STREAM, MediaType.ALL));
ResponseExtractor<Void> responseExtractor = response -> {
try {
InputStream is = response.getBody();
OutputStream outstream = new OutputStream(){
@Override
public void write(int b) throws IOException {
}
};
int size = 32528; // This value is the buffer size I get when I get a buffer size from an oracle.sql.BLOB instance
byte[] buffer = new byte[size];
int length = -1;
System.out.println("Before writing - MB: " + (double) (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / (1024 * 1024) + " / " + Runtime.getRuntime().totalMemory() / (1024 * 1024));
while ((length = is.read(buffer)) != -1) {
outstream.write(buffer, 0, length);
}
System.out.println("After writing - MB: " + (double) (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / (1024 * 1024) + " / " + Runtime.getRuntime().totalMemory() / (1024 * 1024));
is.close();
outstream.close();
} catch(Exception e){
e.printStackTrace();
}
return null;
};
restTemplate.execute(URI.create("https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"), HttpMethod.GET, requestCallback, responseExtractor);
}
When I downloaded the 20 MB file, this is what my println statements said:
Before writing - MB: 134.33865356445312 / 627
After writing - MB: 214.2599334716797 / 627
When I downloaded the small file (less than 1 MB), this is what my println statements said:
Before writing - MB: 126.80110931396484 / 627
After writing - MB: 128.01902770996094 / 627
I see that when I downloaded the 20MB file, the memory usage increased by around 80 MB. The usage was minimal (2MB) when I downloaded the smaller file. I know that streams are a best practice, but from this data I'm not entirely confident that this solution will best solve my problem.
My questions are as follows:
- Why does the usage increase by 80 MB when I open an InputStream that downloads the 20 MB file?
- If 100 users hit this endpoint and are all downloading large files (not necessarily the same one), will an OutOfMemoryException probably occur? In other words, does this code scale significantly better than simply putting the whole file into memory?