6

I'm trying to find a more efficient method of reading a file from a remote URL and saving it into a byte array. Here is what I currently have:

private byte[] fetchRemoteFile(String location) throws Exception {
  URL url = new URL(location);
  InputStream is = null;
  byte[] bytes = null;
  try {
    is = url.openStream ();
    bytes = IOUtils.toByteArray(is);
  } catch (IOException e) {
    //handle errors
  }
  finally {
    if (is != null) is.close();
  }
  return bytes;
}

As you can see, I currently pass the URL into the method, where it uses an InputStream object to read in the bytes of the file. This method uses Apache Commons IOUtils. However, this method call tends to take a relatively long time to run. When retrieving hundreds, thousands, or hundreds of thousands of files one right after another, it gets quite slow. Is there a way I could improve this method so that it runs more efficiently? I have considered multithreading but I would like to save that as a last resort.

DerStrom8
  • 1,311
  • 2
  • 23
  • 45
  • Without multithreading, you're limited to _one right after another_. – Sotirios Delimanolis Nov 18 '14 at 18:57
  • Theres nothing wrong with the code (aside from the general limitation that the byte[] must obviously fit the heap and the inherent 2GB limit, but I assume you don't mind either). The perceived "slowness" probably comes from the URL's being http's which require a new network connection to retrieve each file (the overhead is notable if there are many small files). Aside from using multiple requests (that is multithreading) *or* working *directly with http 1.1* keeping the connection open there isn't much potential to speed this up. – Durandal Nov 18 '14 at 19:06
  • @Sotirios Yes, and I'm wondering if there's a way to make the above code more efficient so that even running one right after another, it goes faster than it does now. I don't know if there's really anything I can do other than multithreading, but that's why I asked – DerStrom8 Nov 18 '14 at 19:06
  • First find out where the bottle neck is. Is the network slow or the server or something else. Only when you know that you can think about optimizations. For example, if the network connection is slow, you will not have benefits from multithreading. – Henry Nov 18 '14 at 19:07
  • Thanks guys, That's what I was leaning towards. I am still working on putting together some performance tests to determine which parts are the slowest, but just wanted to see if there was an obvious improvement I could make to what I already have. Much appreciated! – DerStrom8 Nov 18 '14 at 19:09

1 Answers1

2

Your way of doing it seems like absolutely ok.

But if you saying:

"However, this method call tends to take a relatively long time to run"

You can have follow problems :

  • Network, connection issue

  • Are you sure that download each file in separate thread?

If you are using multithreading for that, be sure that VM args -XmsYYYYM and -XmxYYYYM configured well, because if not you can face problem , that your processor not using all cores. I have faced this problem some time ago.

Maksym
  • 4,434
  • 4
  • 27
  • 46