Is coroutine faster than Thread in Kotlin? And why? How can I get the time of 'context switching'?

Question

I am testing the speed between Thread and Coroutine.

And I found out an interesting stuff.

When the number of Thread and Coroutine is very small, Thread is faster. However, when the number becomes bigger, Coroutine is much faster.

Here's the code that I tested out.

class ExampleUnitTest {
    val reps = 1000000
    val sumSize = 999

    @Test
    fun addition_isCorrect() {
        assertEquals(4, 2 + 2)
    }

    @Test
    fun runInThread() {
        var sum = 0
        val threadList = ArrayList<Thread>()

        println("[start] Active Thread = ${Thread.activeCount()}")
        val time = measureTimeMillis {
            repeat(reps) {
                val mThread = Thread {
//                    println("start: ${Thread.currentThread().name}")
//                    Thread.sleep(1000L)
//                    println("end: ${Thread.currentThread().name}")
                }
                mThread.start()
                threadList += mThread
            }

            println("[end] Active Thread= ${Thread.activeCount()}")

            threadList.forEach {
                it.join()
            }
        }
        println("Time: $time ms\n")
    }

    @Test
    fun runInCoroutine() {
        var sum = 0
        val jobList = ArrayList<Job>()

        runBlocking {
            println("[start] Active Thread = ${Thread.activeCount()}")
            val time = measureTimeMillis {
                repeat(reps) {
                    val job = launch(Dispatchers.Default) {
//                        println("start: ${Thread.currentThread().name}")
//                        delay(1000L)
//                        println("end: ${Thread.currentThread().name}")
                    }
                    jobList += job
                }

                println("[end] Active Thread= ${Thread.activeCount()}")

                jobList.forEach {
                    it.join()
                }
            }
            println("Time: $time ms\n")
        }
    }
}

try	reps size	Thread time(ms)	Coroutine time(ms)
1	10	1	63
2	100	8	65
3	1000	55	90
4	10000	426	175
5	100000	4089	395
6	1000000	43868	3165

At the end, it turns out Using coroutines is faster than using a lot of Threads.

However, I don't think only 'context switching' takes that much time since the task is empty and context switching work looks it's very tiny tiny. Does context switching can make that much big differences?

See e.g. https://stackoverflow.com/questions/48106252/why-threads-are-showing-better-performance-than-coroutines and https://stackoverflow.com/questions/58254985/is-it-better-to-use-a-thread-or-coroutine-in-kotlin and https://stackoverflow.com/questions/43021816/difference-between-thread-and-coroutine-in-kotlin — Michael, May 11 '21 at 15:34
Coroutines use Threads under the hood, but they are used from shared pools. So instead of creating one thread per job, it creates up to 64 (or some other number based partially on number of CPU cores on the host computer) so it doesn't use more than necessary. — Tenfour04, May 11 '21 at 18:38
@Tenfour04, my question is what makes the huge difference of speed between them. I don't think context switching takes that much time(like more than double). It should take very little time. — c-an, May 12 '21 at 02:06
A thread is an expensive object to instantiate. Not sure why at low reps it is faster with Threads, but this isn’t a very scientific benchmark, since it doesn’t account for warmup time and you’re possibly running the tests in succession. I suggest using a benchmarking library to get accurate results. — Tenfour04, May 12 '21 at 02:14
Your results for a low count are just meaningless. They don't measure whether "thread" or "coroutine" is faster, but the initialization time of the respective library components. Unless you narrow down your conclusion specifically to the latency of a single run, any generalization is invalid. — Marko Topolnik, May 12 '21 at 10:23
But what did you intend to measure in the first place? The time to start a thread/coroutine? The time to join them? Neither of these will tell you how "fast" coroutines/threads are, but just those specific things. — Marko Topolnik, May 12 '21 at 10:28
@MarkoTopolnik someone said Coroutine is faster because of context switching. And I wanted to know what makes the difference and does context switching really take so much? — c-an, May 12 '21 at 11:02
To see that effect, you have to start at least 1000 threads or coroutines, so that the OS/Kotlin has some serious context switching to do. And you must also have some blocking (for threads) or suspending (for coroutines) code inside because otherwise there's no context switching to begin with. — Marko Topolnik, May 12 '21 at 11:06

Miles Morales · Answer 1 · 2021-05-12T13:59:10.847

0

By default, tasks are run with threads, which is usually going to be sufficient for your everyday program. They tend to slow down when there are a lot of tasks being run, which is why using coroutines help to speed things up. Threads run tasks in serial, while coroutines runs many tasks at the same time.

EDIT: I think this should help you.

edited May 12 '21 at 13:59

answered May 11 '21 at 16:11

Miles Morales

248
1
5
20

So, coroutine uses the resources efficiently? It'd be great if you have some images to explain. – c-an May 12 '21 at 02:08

Marko Topolnik · Accepted Answer · 2021-05-12T11:35:24.430

I simplified your code a bit and added a loop that repeats the measurements:

import kotlinx.coroutines.Job
import kotlinx.coroutines.launch
import kotlinx.coroutines.runBlocking
import kotlin.concurrent.thread
import kotlin.system.measureTimeMillis

fun main() {
    repeat(5) {
        runInCoroutine(10)
    }
}

fun runInCoroutine(reps: Int) {
    runBlocking {
        measureTimeMillis {
            val jobList = ArrayList<Job>()
            repeat(reps) {
                jobList += launch { }
            }
            jobList.forEach { it.join() }
        }.also {
            println("Time: $it ms\n")
        }
    }
}

Here's my typical output:

Time: 15 ms
Time: 1 ms
Time: 1 ms
Time: 0 ms
Time: 0 ms

As you can see, that first run does not generalize to anything else than "the time to run the code for the first time". I also note my first run was four times faster than on your side, I'm on Java 16 and Kotlin 1.4.32.

EDIT

I extended the example with a bit more realistic demonstration of the advantage of coroutines in terms of "context switching". Now each task sleeps for 1 ms ten times in a row, and we use 10,000 tasks:

import kotlinx.coroutines.*
import java.lang.Thread.sleep
import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeUnit.NANOSECONDS
import kotlin.concurrent.thread
import kotlin.system.measureNanoTime
import kotlin.system.measureTimeMillis

fun main() {
    val numTasks = 10_000
    repeat(10) { _ ->
        measureNanoTime {
            runInCoroutine(numTasks)
        }.also { tookNanos ->
            println("Took %,d ms".format(NANOSECONDS.toMillis(tookNanos)))
        }
    }
}

fun runInCoroutine(numCoroutines: Int) {
    List(numCoroutines) {
        GlobalScope.launch {
            repeat(10) { delay(1) }
        }
    }.also { jobs ->
        runBlocking {
            jobs.forEach { it.join() }
        }
    }
}

fun runInThread(numThreads: Int) {
    List(numThreads) {
        thread {
            repeat(10) { sleep(1) }
        }
    }.forEach {
        it.join()
    }
}

For runInCoroutine, I get the following:

Took 557 ms
Took 341 ms
Took 334 ms
Took 312 ms
Took 296 ms
Took 264 ms
Took 296 ms
Took 302 ms
Took 304 ms
Took 286 ms

And for runInThread, I get this:

Took 729 ms
Took 682 ms
Took 654 ms
Took 658 ms
Took 662 ms
Took 660 ms
Took 686 ms
Took 706 ms
Took 689 ms
Took 697 ms

The coroutine code took 2.5 times less time. It probably also used a lot less RAM, but I didn't test that part.

So, the work of context switching is very slow, is that right? Why is that? And Is it possible to measure the time of context switching? I think `total time/num of thread` can do it appropriately. — c-an, May 12 '21 at 12:01
It would be very difficult to directly measure the time spent on context switching, and the term is used loosely anyway. It also includes, for example, the time from the requested sleep elapsing to the thread actually resuming. I guess, in our example, you could define something like (total time) - (time one task is supposed to take) since ideally, all tasks would run in parallel and complete at the same moment, exactly 10 ms later. — Marko Topolnik, May 12 '21 at 12:07

score 0 · Answer 3 · answered May 12 '21 at 11:38

The best answer is probably described here

There are multiple differences - threads consume way more OS resources as they are linked to the OS threads. Concurrent threads are usually switching which is expensive. Coroutines do not use OS resources and switching between concurrent coroutines is cheap.

score 0 · Answer 4 · answered Mar 07 '22 at 15:09

The above answer would not be satisfying to many. For those who are not satisfied ->

"Couroutine lead to a model where all data are private to a thread where as in normal threads the data is shared between the threads. So a lot of time of processor is saved while executing".

Is coroutine faster than Thread in Kotlin? And why? How can I get the time of 'context switching'?

4 Answers4

EDIT

Linked