6

A few days ago I created a simple benchmark (without jmh and all of the another specialized stuff, just to measure roughly).

I've found that for the same simple task (iterate through 10 million numbers, square them, filter only even numbers and reduce their sum), Java works much faster. Here's the code:

Kotlin:

fun test() {
    println((0 .. 10_000_000L).map { it * it }
                              .filter { it % 2 == 0L }
                              .reduce { sum, it -> sum + it })
}

Java:

public void test() {
    System.out.println(LongStream.range(0, 10_000_000)
                                 .map(it -> it * it)
                                 .filter(it -> it % 2 == 0)
                                 .reduce((sum, it) -> sum + it)
                                 .getAsLong());
}

I'm using Java version 1.8.0_144 and Kotlin version 1.2.

On my hardware in average it takes 85ms for Java and 4,470ms for Kotlin to execute the corresponding functions. Kotlin works 52 times slower.

I suspect that the Java compiler produces optimized bytecode, but I didn't expected to see such a huge difference. I'm wondering if I'm doing something wrong? How can I compel Kotlin to work faster? I like it because of its syntax, but 52 times is a big difference. And I just wrote Java 8-like code, not the plain old iterative version (which, I believe, will be much faster than given one).

Kirill Rakhman
  • 42,195
  • 18
  • 124
  • 148
the_kaba
  • 1,457
  • 2
  • 14
  • 31
  • 4
    That isn't the same thing. A range in Kotlin isn't lazy. – chris Jan 18 '18 at 09:24
  • 2
    The point is: in order to understand such things, in the end, you have to look into the **bytecode** that gets generated. And as @chris is pointing out: conceptually, it makes a huge difference that java streams are lazy, and kotling ranges are not. – GhostCat Jan 18 '18 at 09:25
  • It has been discussed here - https://stackoverflow.com/questions/44081105/do-kotlin-provide-any-performance-boosts – vinS Jan 18 '18 at 09:26
  • Thanks for update! So can you advice how to write in 'kotlin-like' style to achieve better performance? – the_kaba Jan 18 '18 at 09:26
  • 7
    I don't know if there's any more direct way, but [`asSequence`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.collections/as-sequence.html) returns a `Sequence`, which is lazy. – chris Jan 18 '18 at 09:31
  • @GhostCat, thanks! I know, i'll investigate the bytecode also when i'll have enough time. But for now i've asked here colleagues. My reasoning is if i want to use language, i did't expect to look into bytecode by myself or hack it somehow) It's a job of a language compiler - to produce competitive bytecode, i believe. I heard that kotlin promises comparable with java performance. I heard about inlines and all of the other stuff. But it's not for this case. If you know how to optimze given code for kotlin - please share. – the_kaba Jan 18 '18 at 09:31
  • @chris thank you! It helped. Now the difference is comparable(1 to 1.5, java wins). – the_kaba Jan 18 '18 at 09:33
  • 1
    @the_kaba That's the thing: there is a simple "lets just use that other language". When you are serious and professional about the tools you are using, there is no way but *learning* all such subtle, small, minor, almost invisible details. In other words: when you intend to use Kotlin for professional software development, then you really have to look at **everything** that the Kotlin language gives to you with that sense of "I have to understand at least 95% of what exactly this will do". – GhostCat Jan 18 '18 at 09:33
  • @GhostCat I agree. Thanks! Just now have not enough time to do such kind of research by myself, but i'm definitely going to do it soon. – the_kaba Jan 18 '18 at 09:35
  • 2
    "without jmh and all of the another specialized stuff" - that's where you went wrong - JMH is a good way to do correct microbenchmarking.You didn't describe how you did do your performance test. You have probably ended up measuring the time it takes to load classes, run interpreted bytecode, perform Hotspot compilation, etc. The results of an incorrect microbenchmark are meaningless. – Erwin Bolwidt Jan 18 '18 at 09:48
  • You can just use `LongStream` in Kotlin: `LongStream.range(0, 10_000_000).map { it * it }...` – Alexey Romanov Jan 18 '18 at 09:50
  • Right, that is the **other** thing to be really aware of: writing benchmarks for the JVM isn't an easy task. Must read: https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – GhostCat Jan 18 '18 at 09:50
  • @AlexeyRomanov Yeah, Jetbrains doesn't claim "100% interoperability with Java" for no reason ;-) – GhostCat Jan 18 '18 at 09:51
  • 1
    Remember that you're not comparing Kotlin with Java, but one API with another API, both of which Kotlin equally supports. Kotlin specifically _embraces_ the JDK and complements it with its own additions and extensions. The very functions you used are extension functions on Java's `Iterable` and are there because they are more convenient to use than the Streams API. – Marko Topolnik Jan 18 '18 at 10:10
  • 1
    I changed the title to better reflect the question. Generalizations like "Why is Kotlin slow" don't make for a good question and undeservedly hurt the language's reputation. – Kirill Rakhman Jan 24 '18 at 09:50
  • For small collections, Kotlin's non-lazy collections transformations are usually faster than Java streams. For large collections you should use Kotlin sequences which are lazy, but sequences have a boxing performance penalty for streams of primitive types compared to Java 8 specialized interfaces. – BladeCoder Dec 13 '18 at 09:42

2 Answers2

35

When you compare apples to oranges, the results don't tell you much. You compared one API to another API, each having a totally different focus and goals.

Since all of JDK is as much "Kotlin" as the Kotlin-specific additions, I wrote more of an apples-to-apples comparison, which also takes care of some of the "JVM microbenchmark" concerns.

Kotlin:

fun main(args: Array<String>) {
    println("Warming up Kotlin")
    test()
    test()
    test()
    println("Measuring Kotlin")
    val average = (1..10).map {
        measureTimeMillis { test() }
    }.average()
    println("An average Kotlin run took $average ms")
    println("(sum is $sum)")
}

var sum = 0L

fun test() {
    sum += LongStream.range(0L, 100_000_000L)
            .map { it * it }
            .filter { it % 2 == 0L }
            .reduce { sum, it -> sum + it }
            .asLong
}

Java:

public static void main(String[] args) {
    System.out.println("Warming up Java");
    test();
    test();
    test();
    System.out.println("Measuring Java");
    LongSummaryStatistics stats = LongStream.range(0, 10)
                                            .map(i -> measureTimeMillis(() -> test()))
                                            .summaryStatistics();
    System.out.println("An average Java run took " + stats.getAverage() + " ms");
    System.out.println("sum is " + sum);

}

private static long sum;

private static void test() {
    sum += LongStream.range(0, 100_000_000)
                     .map(it -> it * it)
                     .filter(it -> it % 2 == 0)
                     .reduce((sum, it) -> sum + it)
                     .getAsLong();
}

private static long measureTimeMillis(Runnable measured) {
    long start = System.nanoTime();
    measured.run();
    return TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
}

My results:

Warming up Kotlin
Measuring Kotlin
An average Kotlin run took 158.5 ms
(sum is 4276489111714942720)


Warming up Java
Measuring Java
An average Java run took 357.3 ms
sum is 4276489111714942720

Suprised? I was too.

Instead of digging further, trying to figure out this inversion of the expected results, I would like to make this conclusion:

Kotlin's FP extensions on Iterable are there for convenience. In 95% of all use cases you don't care whether it takes 1 or 2 µs to perform a quick map-filter on a list of 10-100 elements.

Java's Stream API is focused on the performance of bulk operations on large data structures. It also offers auto-parallelization towards the same goal (although it almost never actually helps), but its API is crippled and at times awkward due to these concerns. For example, many useful operations which don't happen to parallelize well are just not there, and the whole paradigm of non-terminal vs. terminal operations adds bulk to each and every Streams expression you write.


Let me also address a few more of your statements:

I know that the Java compiler produces optimized bytecode

This is a) not true and b) largely irrelevant because there is (almost) no such thing as "optimized bytecode". Interpreted execution of bytecode is always at least an order of magnitude slower than JIT-compiled native code.

And I just wrote Java 8-like code, not the plain old iterative version (which, I believe, will be much faster than given one).

You mean this?

Kotlin:

fun test() {
    var sum: Long = 0
    var i: Long = 0
    while (i < 100_000_000) {
        val j = i * i
        if (j % 2 == 0L) {
            sum += j
        }
        i++
    }
    total += sum
}

Java:

private static void test() {
    long sum = 0;
    for (long i = 0; i < 100_000_000; i++) {
        long j  = i * i;
        if (j % 2 == 0) {
            sum += j;
        }
    }
    total += sum;
}

These are the results:

Warming up Kotlin
Measuring Kotlin
An average Kotlin run took 150.1 ms
(sum is 4276489111714942720)

Warming up Java
Measuring Java
An average Java run took 153.0 ms
sum is 4276489111714942720

In both languages the performance is almost the same as Kotlin + Streams API above. As said, the Streams API is optimized for performance.

Both kotlinc and javac probably produced very similar bytecode given this straightforward source code, then HotSpot did its work on both the same way.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
3

Probably the assumption of this question is not quite correct: "Why is Kotlin so slow in comparison with Java?"

According to my benchmark (credits to Marko Topolnik) below, it can be as fast or slightly faster, a bit slower or much slower.

Here is the code I tried out which tests the following implementations:

  • java LongStream based implementation (as fast)
  • using Kotlin's sequence. (slower by a factor of 5 or so)
  • using no sequence following the pattern used in the question (much much slower)

...

import java.util.stream.LongStream
import kotlin.system.measureTimeMillis

var sum = 0L

val limit = 100_000_000L

val n = 10

fun main(args: Array<String>) {
    runTest(n, "LongStream", ::testLongStream)
    runTest(n, "Kotlin sequence", ::testSequence)
    runTest(n, "Kotlin no sequence", ::testNoSequence)
}

private fun runTest(n: Int, name: String, test: () -> Unit) {
    sum =  0L
    println()
    println(":: $name ::")
    println("Warming up Kotlin")
    test()
    test()
    test()
    println("Measuring Kotlin")
    val average = (1..10).map {
        measureTimeMillis { test() }
    }.average()
    println("An average Kotlin run took $average ms")
    println("(sum is $sum)")
}

fun testLongStream() {
    sum += LongStream.range(0L, limit)
            .map { it * it }
            .filter { it % 2 == 0L }
            .reduce { sum, it -> sum + it }
            .asLong
}

fun testSequence() {
    sum += (0 until limit).asSequence().map { it * it }
            .filter { it % 2 == 0L }
            .reduce { sum, it -> sum + it }
}

fun testNoSequence() {
    sum += (0 until limit).map { it * it }
            .filter { it % 2 == 0L }
            .reduce { sum, it -> sum + it }
}

When you run the code above you will see on the console this output - which gives an idea of the performance diversity you can get with Kotlin:

:: LongStream ::
Warming up Kotlin
Measuring Kotlin
An average Kotlin run took 160.4 ms
(sum is 4276489111714942720)

:: Kotlin sequence ::
Warming up Kotlin
Measuring Kotlin
An average Kotlin run took 885.1 ms
(sum is 4276489111714942720)

:: Kotlin no sequence ::
Warming up Kotlin
Measuring Kotlin
An average Kotlin run took 16403.8 ms
(sum is 4276489111714942720)
gil.fernandes
  • 12,978
  • 5
  • 63
  • 76