Ktor - Handle large file operations without memory leak

Question

I am very new to backend development. Basically, I want to create a robust & simple application that will accept a zip file URL in the params and then download the zip file from the URL and finally extract the zip and return the bin file inside it. Note: The zip file size can range from 5MB to 150MB. I have tried doing the described operation in the following manner.

package la.sample

import io.ktor.application.Application
import io.ktor.application.call
import io.ktor.client.HttpClient
import io.ktor.client.request.get
import io.ktor.http.HttpStatusCode
import io.ktor.response.respond
import io.ktor.response.respondFile
import io.ktor.routing.get
import io.ktor.routing.routing
import java.io.*


fun Application.startServer() {
    routing {
        get("/get-bin") {

            //Gets the AWS Url from params
            val awsUrl = call.request.queryParameters.get("url") ?: "Error"

            // Download the zip file from the AWS URL
            val client = HttpClient()
            val bytes = client.get<ByteArray>(awsUrl)

            //Create a temp file on the server & write the zip file bytes into it.
            val file = File(".", "data.zip") 
            file.writeBytes(bytes) 
            
            //Call a method to unzip the file
            unzipAndReturnBinFile()?.let { 
                call.respondFile(it) //respond with bin file
            } ?: kotlin.run{
                call.respond(HttpStatusCode.InternalServerError)
            }
        }
    }
}


fun unzipAndReturnBinFile(): File? {

    var exitVal = 0

    //Command shell to unzip the file
    Runtime.getRuntime().exec("unzip bundle.zip -d data").let {//command shell to unzip the zip file
        exitVal += it.waitFor()
    }

    //Check if the command executed successfully 
    if (exitVal == 0) {

        var binFile: File? = null

        //check if the extracted files contain `bin`
        File("data").listFiles().forEach {

        if (it.name.contains(".bin")) {
            binFile = it
        }
    }

    //return bin or null otherwise
    return binFile
} else {
    throw Exception("Command Shell Execution failed.")
}
}

The above codes work fine in local machine, irrespective of the Zip file size. But when it is deployed to AWS, the code breaks if the zip or the bin file is larger than 100 MB and gives a java.lang.OutOfMemoryError error. I will be very thankful if someone can suggest to me a proper way of handling large file operations in the backend with the ability to handle 100s of such concurrent calls. Thank you.

Java heap size of my remote machine is around 1 GB.

`OutOfMemoryError` doesn't happen only with memory leaks, it can genuinely mean that the machine or jvm has run out of memory. What setting do you have on the JVM and how much available memory does the machine have? — AlexT, Dec 02 '20 at 10:05
@Alex.T There is around 1 GB of heap size allocated for java programs. — Saurabh Padwekar, Dec 02 '20 at 14:34
exec command should run on the different processes and except the process, variable shouldn't use any memory on heap? can you attach a stack trace — Naor Tedgi, Dec 07 '20 at 07:24

Naor Tedgi · Answer 1 · 2020-12-07T14:11:08.377

your problem is not from the unzipping procedure,

runtime exec command runs on a different process and only use of min size on the heap of the forked process to save instruction of return address.

the problem that causing the outOfMemory is in these lines

val bytes = client.get<ByteArray>(awsUrl)
val file = File(".", "data.zip") 
file.writeBytes(bytes)

it will only take 6 concurrent requests of size 150Mb to finish all your Heap size.

instead of waiting for the file to fully download before saving it to the disk, you should use Stream, and then every time a chunk of data downloaded you saving it to the disk then in that way the full size of the downloaded file will never be at the RAM at the same time.

Use apache commons-io, for example :

FileUtils.copyURLToFile(URL, File)

or if you would like more control over the procedure try using Ben Noland answer

https://stackoverflow.com/a/921408/4267015

Damn! Exactly they were causing the problem. Based on your answer I changed my code to receive the multi-part file and write the chucks to another file as soon as I get them instead of storing them in the memory. It worked like charm. Thanks a lot. — Saurabh Padwekar, Dec 08 '20 at 13:48

Saurabh Padwekar · Answer 2 · 2020-12-09T07:08:29.373

Based on @Naor's comment, I have updated the code to accept the multipart file and write every small chuck (part) to another file as soon as I get them, without storing the entire data in the memory. It has solved the issue. Below is the updated code snippet.

        val file = File(".", Constant.FILE_PATH)
        call.receiveMultipart().apply {
            forEachPart {
                if (it is PartData.FileItem) {
                    it.streamProvider().use { input ->
                        file.outputStream().buffered().use { output -> input.copyToSuspend(output) }
                    }
                }
                it.dispose
            } }

Ktor - Handle large file operations without memory leak

2 Answers2