How can I more efficently download large files over http?

Question

I'm trying to download large files (<1GB) in Kotlin since I already knew I'm using okhttp and pretty much followed just used the answer from this question. Except that I'm using Kotlin instead of java, so the syntax is slightly diffrent.

val client = OkHttpClient()
val request = Request.Builder().url(urlString).build()
val response = client.newCall(request).execute()

val is = response.body().byteStream()

val input = BufferedInputStream(is)
val output = FileOutputStream(file)

val data = ByteArray(1024)
val total = 0L
val count : Int
do {
    count = input.read(data)
    total += count
    output.write(data, 0, count)
} while (count != -1)

output.flush()
output.close()
input.close()

That works in that it downloads the file without using too much memory but it seems needlessly ineffective in that it constantly tries to write more data without knowing if any new data arrived. That also seems confirmed with my own tests while running this on a very resource limited VM as it seems to use more CPU while getting a lower download speed then a comparable script in python, and of cause using wget.

What I'm wondering if there is a way where I can give something a callback that gets called if x bytes are available or if it's the end of the file so I don't have to constantly try and get more data without knowing if there is any.

Edit: If it's not possible with okhttp I don't have a problem using something else, it's just that it was the http library I'm used to.

It does know more data has arrived because `count` is equal to the amount of data received, and it keeps looping until `count` is `-1`. Or am I misunderstanding something? — Mark, Oct 23 '18 at 07:49
Well it dosen't know before calling `input.read(data)` and that function can return 0 or other really small numbers. I'd want to only write new data to the disk if I got for example 1024 bytes ready. — usbpc102, Oct 23 '18 at 07:50
`while ((count = input.read(data)) == 0) { Thread.sleep(50L); }` _(Never used it though.)_ — Joop Eggen, Oct 23 '18 at 07:58
Well that would at least reduce the unneccecary CPU cycles, but dosen't exactly do what I want either, it will of cause make my code not write "nothing" to the disk but it would still write small amounts of data (if 200 bytes) arrive to disk. Thanks, I'll consider it if nothing else get's suggested :) — usbpc102, Oct 23 '18 at 08:02

Marko Topolnik · Accepted Answer · 2018-10-23T13:38:45.447

2

As of version 11, Java has a built-in HttpClient which implements

asynchronous streams of data with non-blocking back pressure

and that's what you need if you want your code to run only when there's data to process.

If you can afford to upgrade to Java 11, you'll be able to solve your problem out of the box, using the HttpResponse.BodyHandlers.ofFile body handler. You won't have to implement any data transfer logic on your own.

Kotlin example:

fun main(args: Array<String>) {    
    val client = HttpClient.newHttpClient()

    val request = HttpRequest.newBuilder()
            .uri(URI.create("https://www.google.com"))
            .GET()
            .build()

    println("Starting download...")
    client.send(request, HttpResponse.BodyHandlers.ofFile(Paths.get("google.html")))
    println("Done with download.")
}

edited Oct 23 '18 at 13:38

answered Oct 23 '18 at 08:21

Marko Topolnik

195,646
29
319
436

Thanks this looks promising, I'll have to look if kotlin works well java 11. – usbpc102 Oct 23 '18 at 08:24
You'll find that Kotlin has no issues interoperating with Java 11. https://www.reddit.com/r/Kotlin/comments/90cqno/any_news_on_java_11_support/ – Marko Topolnik Oct 23 '18 at 08:30
1

Looking through the documentation of the Java 11 HttpClient it looks to be exactly what I want, I'll implement it later today and update your answer with a specific code example then, assuming I don't hit unexpected probelms. Thanks a lot! – usbpc102 Oct 23 '18 at 09:52

Joop Eggen · Answer 2 · 2018-10-23T08:15:54.927

0

One could do away with the BufferedInputStream. Or as its default buffer size in Oracle's java is 8192, use a larger ByteArray, say 4096.

However best would be to either use java.nio or try Files.copy:

Files.copy(is, file.toPath());

This removes about 12 lines of code.

An other way is to send the request with a header to deflate gzip compression Accept-Encoding: gzip, so the transmission takes less time. In the response here then possibly wrap is in a new GZipInputStream(is) - when the response header Content-Encoding: gzip is given. Or if feasible store the file compressed with an addition ending .gz; mybiography.md as mybiography.md.gz.

edited Oct 23 '18 at 08:15

answered Oct 23 '18 at 08:09

Joop Eggen

107,315
7
83
138

I've tried to wait for more data to be available in the BufferedInputStream using the `.available()` methode it provides but even after waiting for 30 seconds not more data arrived (I assume okhttp just dosen't download more data before you read some?) But I'll try the `Files.copy` thing :) – usbpc102 Oct 23 '18 at 08:16

How can I more efficently download large files over http?

2 Answers2

Linked