1

I need to download potentially large files from a server. In order to avoid potential OutOfMemoryExceptions, I want to use InputStreams to prevent putting the entire file in memory at once.

I decided to run a (crude) benchmark test to determine if the method given from the following SO answer would make the memory footprint small:

https://stackoverflow.com/a/38664475/2434579

Using this example, I wrote this code to see how much memory is being used at any given time:

  public void memoryTest(){
    // 20MB file: "https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"
    // Small file: "https://upload.wikimedia.org/wikipedia/en/7/7e/Patrick_Star.png"
    RequestCallback requestCallback = request -> request.getHeaders()
        .setAccept(Arrays.asList(MediaType.APPLICATION_OCTET_STREAM, MediaType.ALL));
    ResponseExtractor<Void> responseExtractor = response -> {
      try {
        InputStream is = response.getBody();
        OutputStream outstream = new OutputStream(){
          @Override
          public void write(int b) throws IOException {
          }
        };
        int size = 32528;  // This value is the buffer size I get when I get a buffer size from an oracle.sql.BLOB instance
        byte[] buffer = new byte[size];
        int length = -1;
        System.out.println("Before writing - MB: " + (double) (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / (1024 * 1024) + " / " + Runtime.getRuntime().totalMemory() / (1024 * 1024));
        while ((length = is.read(buffer)) != -1) {
          outstream.write(buffer, 0, length);
        }
        System.out.println("After writing - MB: " + (double) (Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / (1024 * 1024) + " / " + Runtime.getRuntime().totalMemory() / (1024 * 1024));
        is.close();
        outstream.close();
      } catch(Exception e){
        e.printStackTrace();
      }
      return null;
    };
    restTemplate.execute(URI.create("https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf"), HttpMethod.GET, requestCallback, responseExtractor);
  }

When I downloaded the 20 MB file, this is what my println statements said:

Before writing - MB: 134.33865356445312 / 627
After writing - MB: 214.2599334716797 / 627

When I downloaded the small file (less than 1 MB), this is what my println statements said:

Before writing - MB: 126.80110931396484 / 627
After writing - MB: 128.01902770996094 / 627

I see that when I downloaded the 20MB file, the memory usage increased by around 80 MB. The usage was minimal (2MB) when I downloaded the smaller file. I know that streams are a best practice, but from this data I'm not entirely confident that this solution will best solve my problem.

My questions are as follows:

  1. Why does the usage increase by 80 MB when I open an InputStream that downloads the 20 MB file?
  2. If 100 users hit this endpoint and are all downloading large files (not necessarily the same one), will an OutOfMemoryException probably occur? In other words, does this code scale significantly better than simply putting the whole file into memory?
idungotnosn
  • 2,001
  • 4
  • 29
  • 36
  • It's hard to comment. You're using `outstream.write(buffer, 0, length)` and even if you are overriding the `write` method - you don't know if the memory is actually allocated in the default `OutputStream` implementation. I would at least try to use some kind of `BufferedOutputStream` implementation to limit the memory usage. – Zilvinas Sep 18 '17 at 23:32
  • 2
    It's just garbage (probably a lot of it due to the https protocol). When calling `System.gc()` before each print statement I get: `Before writing - MB: 10.842765808105469 / 309` and `After writing - MB: 9.356369018554688 / 309` – teppic Sep 18 '17 at 23:34
  • In fact, you could remove all code related to `outputStream` entirely. All you're validating is read performance, right? – Zilvinas Sep 18 '17 at 23:34
  • @Zilvinas - That is correct. I am only measuring read performance in this example. I will end up using an OutputStream in my project code. – idungotnosn Sep 19 '17 at 14:36
  • @teppic - I had a hunch that much of the memory would have been garbage collected but I didn't realize I could just run System.gc() to discover that. – idungotnosn Sep 19 '17 at 14:36

0 Answers0