Multiple coroutine async calls performance / high delay issues

Question

I am currently debugging a performance issue (see here) which I could reduce to a code snippet that uses multiple async{} calls in a coroutine. I am delaying by 50 milliseconds but the printed duration (on Windows 10, AMD 5800X3D) is always 60ms or higher:

import kotlinx.coroutines.*
import java.util.concurrent.TimeUnit
import kotlin.system.measureNanoTime

fun main(args: Array<String>) {
    while (true) {
        runBlocking {
            Example().test()
        }
    }
}

class Example {
    suspend fun test(): Test = withContext(Dispatchers.IO) {
        var result: Test? = null
        val time = measureNanoTime {
            val a = async { doDelay() }
            val b = async { doDelay() }
            val c = async { doDelay() }
            result = Test(ID("test"))
            a.await()
            b.await()
            c.await()
        }
        println("TIMING: ${TimeUnit.NANOSECONDS.toMillis(time)}")
        result!!
    }

    suspend fun doDelay() = coroutineScope {
        delay(50)
        "dummy"
    }

    data class Test(val id: String)
}

Searching for coroutines in regards to "high latency" or "long delay" yields no results in this area and the profiling data is hard to analyze as its spread across multiple threads.

I am looking for an explanation how 3 async{} calls that delay for 50 ms actually take 60ms or more to complete. I don't mind 1-2 ms of overhead but 20% (10ms) is just way too much. Reducing the delay to 10 ms reduces the overhead to 5ms, which is worse percentage wise but lower in the absolute sense.

Additional notes:

the delay calls are used to emulate IO processes
this is not about Android but server-side programming on a normal JVM
this is a doubly reduced example, I have double checked this implementation in a simple REST API and checked the network tab of Google Chrome as well - it matches the printed >= 60ms
Dispatchers.IO is chosen for simplicity, in reality its another ThreadPool which is similiar to Dispatchers.IO - but all multithreaded dispatchers seems to "suffer", as outlined
I am using nanotime to avoid the low resolution of Windows System.currentTimeMillis() and some googling confirmed that delay is backed by high resolution timers

It returns 50-52ms on my machine, so I can't reproduce your results. Anyway, I would assume this is not a delay associated with joining of multiple asynchronous tasks. My guess would be this is just a lack of precision when scheduling a postponed task. `delay()` does not guarantee it will wait exactly 50ms. It may wait more than that and few milliseconds sounds like a reasonable imprecision. Try to measure just a single `delay()` to see if you get similar results. You can also run tests for Java's `ScheduledExecutorService`. — broot, May 14 '22 at 07:19
A commiter to Kotlin answered a GitHub discussion by referencing `System.nanoTime()` as the backend for `delay` (https://github.com/Kotlin/kotlinx.coroutines/issues/943#issuecomment-455148395) - but 3-5ms of delay per `async` is quite high for my machine - are you also running Windows? — roookeee, May 14 '22 at 08:50
Even a single delay yields a timing of 60 ms which weird considering that `delay` should be backed by `nanoTime` - I know Windows has resolution issues with `currentTimeMillis` :/. Kind of annoying as it's just meant to simulate I/O while debugging other overhead — roookeee, May 14 '22 at 08:54
Honestly, I don't understand the answer by qwwdfsad in this topic. I don't know what `System.nanoTime` has to do with it, I always thought this is really OS-dependent, because we basically ask the operating system to be awaken after some time and it doesn't have to be very precise. Other answers on SO seem to be confirming this, e.g.: https://stackoverflow.com/questions/20241426/java-scheduledexecutorservice-bad-precision, https://stackoverflow.com/questions/13129538/high-resolution-timer-in-java . Maybe I don't know/understand something here. I use Linux. — broot, May 14 '22 at 09:24
Seems like this is the wrong avenue to investigate my main issue, thank you. I know that the timer resolution is ~10-17ms on Windows and ~1ms on Linux but shied away from that reasoning because I found hints that `delay` is based on `Sytem.nanoTime`/ high resolution timers which "even" work on Windows. Thank you for the help! — roookeee, May 14 '22 at 09:51
Did you ever try to measure the time, `delay(50)` takes, without all the other surrounding stuff (coroutines, etc.)? — Holger, May 16 '22 at 09:27
I didn't bother and `delay(50)` as its just used to emulate I/O. Furthermore `delay` can't be used outside of coroutines, it's the coroutine way of `Thread.sleep`ing. — roookeee, May 16 '22 at 17:26
If I put `delay(50)` I get an average of 62ms on 100k runs. If I put `delay(500)` I get an average of 505ms on 1k runs and 1008ms with `delay(1000)`. I guess it's not precise at all. Factoring in all the answers above I'd also speculate that coroutines have specific interval when they check if they can be resumed, so it's probably a game of matching your delay to a multiple of that interval. — rtsketo, May 17 '22 at 17:33
This question isn't about `delay`, it is just used to simulate I/O. Furthermore I am quite certain that the implementation of `delay` is inaccurate (in terms of accuracteness of the delay), coroutines themselfs do not suffer from a delayed scheduling. Removing all the delays from the code (which still uses coroutines) a dummy server responds in <= 1ms which reinforces this observation. The use of `delay` in my example renders the whole question useless as the timing variance is coming from `delay` which is not what I am interested about - I found the solution in my other (linked) question — roookeee, May 17 '22 at 17:49

Multiple coroutine async calls performance / high delay issues

0 Answers0