0

I'm trying to write a bulk-downloader for images. Getting the InputStream from an URLConnection is easy enough, but downloading all files takes a while. Using multithreading sure speeds it up, but having a lot of threads download files could use a lot of memory. Here's what I found:

Let in be the InputStream, file the target File and fos a FileOutputStream to file

The simple way

fos.write(in.readAllBytes());

Read whole file, write the returning byte[]. Probably useable for getting the website source, no good for bigger files such as images.

Writing chunks

 byte[] buffer = new byte[bufsize];
 int read;
 while ((read = in.read(buffer, 0, bufsize)) >= 0) {
     fos.write(buffer, 0, read);
 }

Seems better to me.

in.transferTo(fos)

in.transferTo(fos);

Writes chunks internally, as seen above.

Files.copy()

Files.copy(in, file.toPath(),  StandardCopyOption.REPLACE_EXISTING);

Appears to use native implementations.

Which one of these should I use to minimize memory usage when done dozens of times in parallel?

This is a small project fur fun, external libraries are overkill for that IMO. Also I can't use ImageIO, since that can't handle webms, some pngs/jpgs and animated gifs.

EDIT:
This question was based on the assumption that concurrent writing is possible. However, it doesn't seem like that is the case. I'll probably get the image links concurrently and then download them one after another. Thanks for the answers anyways!

mindoverflow
  • 364
  • 1
  • 12
  • *Which one of these should I use to minimize memory usage when done dozens of times in parallel?* None. Don't copy dozens of times in parallel. That will require more memory than performing the copy serially, and more time too as the writing operations block. – Elliott Frisch Mar 12 '20 at 14:12
  • Bad wording on my end. It's copying data from an image online to a file using an InputStream. Also, every image will be downloaded to a seperate file, so blocking shouldn't be a problem. – mindoverflow Mar 12 '20 at 14:47
  • And are those separate files also on separate file systems on separate drives? So, write to file 1 and file 2 simultaneously does not block? I'm sorry to say that is not correct. – Elliott Frisch Mar 12 '20 at 14:52
  • Ok, I didn't know that – mindoverflow Mar 12 '20 at 15:23

2 Answers2

1

The short answer is: from the memory usage perspective the best solution is to use the version which reads and stores data in chunks.

The buffer size should be basically choosen taking into account the number of simultaneuous downloads, available memory, download speed and efficiency of the target drive in terms of data tranfer rate and IOPS.

The long answer is that concurrent download of files doesn't neccesarilly mean the download will be faster. The number of simultaneuous downloads to actually speed up the overall download time mostly depends on:

  • number of hosts from which you're downlading
  • speed of internet connection of the host from which you're downloading, limited by the speed of the network adapter of this host
  • speed of your internet connection, limited by the speed of the network adapter of this host
  • IOps of the storage of the host from which you're downloading
  • IOps of the storage you're downloading into
  • Tranfer rate of the storage on the host from which you're downloading
  • Tranfer rate of the storage you're downloading into
  • Performance of the local and remote hosts. For instance some older or low cost android device could be limited by the CPU speed.

For instance it could appear that if the source host has single hdd drive and single connection already gives the full connection speed, then it is useless to use multiple connections, as it would make the download slower by creating overhead of switching beetwen tranfered files.

It could be also that the source host has a speed limit on single connection, so multiple connections could speed things up.

HDD drive usually have an IOPS value around 80 IOPS and tranfer rate about 80 MB/s, and it could limit the speed of download/upload by these factors. So practically you can't write or read from such disk more than 80 files per second, and more than the tranfer limit around 80MB/s, of course this hardly depends on the disk model.

SSD drive usually have tens of thousands of IOPS and transfer rate > 400 MB/s, so the limits are much bigger, but for really fast internet connections they are still important.

Krzysztof Cichocki
  • 6,294
  • 1
  • 16
  • 32
0

I found on the internet a time-based comparison (hence performance) here journaldev.com/861/java-copy-file

However if you are focused on memory you could try to measure the memory consumption yourself using something like the code proposed by @pasha701 here

Runtime runtime = Runtime.getRuntime();
long usedMemoryBefore = runtime.totalMemory() - runtime.freeMemory();
System.out.println("Used Memory before" + usedMemoryBefore);
// copy file method here
long usedMemoryAfter = runtime.totalMemory() - runtime.freeMemory();
System.out.println("Memory increased:" + (usedMemoryAfter-usedMemoryBefore));

Notice this returns values are in bytes, divide by 1000000 to get values in MB.

rakwaht
  • 3,666
  • 3
  • 28
  • 45