1

In order to understand better which one out of the two is better, I wrote some code as follows

val nums = listOf(4, 5, 3, 2, 1, -1, 7, 6, -8, 9, -12)
val filterCount = measureNanoTime {
    nums.filter {e -> e > 0}.count()
}
val filterSize = measureNanoTime {
    nums.filter {e -> e >0}.size
}    
println("Time taken for filter size call: $filterSize")
println("Time taken for filter count call: $filterCount")

The output is as follows (and is consistent every time I run it

Time taken for filter size call: 4783
Time taken for filter count call: 65583

Here I have applied a simple filter to the array. It seems to contradict the discussion on this page - Reddit Page Link. If anyone knows why count() seems to take more time, please share. Thank you.

Anshul
  • 51
  • 6
  • 2
    Also, that Reddit post is saying saying that `nums.count { e -> e > 0 }` is more efficient and should be used instead of `nums.filter { e -> e > 0 }.size`. – Slaw Jul 22 '21 at 14:13
  • 6
    What you are doing there is called microbenchmarking. You'll never get any proper results with tests that small. Here is a bit more info [What is microbenchmarking?](https://stackoverflow.com/questions/2842695/what-is-microbenchmarking). – AlexT Jul 22 '21 at 15:03
  • @Slaw: Same result. – Anshul Jul 22 '21 at 16:01
  • @Slaw: Yes, if the count is more effecient how come it takes 7 times the time. Also as per Reddit `@kotlin.internal.InlineOnly public inline fun Collection.count(): Int { return size } ` This seems incorrect as per that logic. Sorry for pulling this. Just trying to understand the internals how they work. – Anshul Jul 22 '21 at 16:05
  • 2
    It’s not just microbenchmarking that is skewing your results but also that you don’t allow warm-up time. It’s simply not a valid way to compare code performance. – Tenfour04 Jul 22 '21 at 16:24
  • The Short Answer is `count()` member has *operation* option, but `size` don't. – Youn Tivoli Jul 22 '21 at 16:31

2 Answers2

4

I agree with Alex.T in the comments, the problem is probably with your measurement method, rather than a discrepancy with the code you're measuring.

Here's a version where we run the algorithm 10 million times, which should hopefully account for any unrelated differences affected by what's going on in the JVM/System.

Result:

Average time taken for filter count call: 97
Average time taken for filter size call: 99

There doesn't seem to be much in it (which makes sense, since count() is inlined to size by the compiler, so they are executing the same code).

Code:

import kotlin.system.measureNanoTime

fun main() {
    val nums = listOf(4, 5, 3, 2, 1, -1, 7, 6, -8, 9, -12)
    val (filterCount, filterSize) = anshulBenchmark(nums)
    println("Average time taken for filter count call: $filterCount")
    println("Average time taken for filter size call: $filterSize")
}

fun anshulBenchmark(nums: List<Int>, iterations: Int = 10_000_000): Pair<Long, Long> {
    var filterCount = 0L
    var filterSize = 0L
    repeat(iterations) {
        filterCount += measureNanoTime { nums.filter { e -> e > 0 }.count() }
        filterSize += measureNanoTime { nums.filter { e -> e > 0 }.size }
    }
    return Pair(filterCount / iterations, filterSize / iterations)
}
Adam Millerchip
  • 20,844
  • 5
  • 51
  • 74
  • 1
    BTW, I checked this with show bytecode -> decompile to Java in IntelliJ, and the `count()` call was indeed replaced with `size`, but it seems nothing got optimized away. – Adam Millerchip Jul 22 '21 at 16:51
  • Correct. So the optimization on the memory side that was discussed in the Reddit thread is not true. Thanks, @Adam, this helps. – Anshul Jul 22 '21 at 20:13
  • 2
    @Anshul, as Slaw pointed out in the comments to your question, the Reddit thread is comparing a different `count()` function to the one in your question. – Adam Millerchip Jul 23 '21 at 02:14
  • 1
    @Anshul To confirm the above, I ran the two versions of the code through JMH and see no difference in average execution time. As noted, the `Collection.count()` extension function is inlined to `size`. But you're focusing on the wrong part of the Reddit post. Look at the comments to said post. They're talking about the `count` function which accepts a predicate. The reason why doing something like `nums.count { e -> e > 0 }` is more efficient than `nums.filter { e -> e > 0 }.size` (or `.count()`) is because the former _does not create an intermediate collection_, whereas the latter does. – Slaw Jul 23 '21 at 15:47
1

Your microbenchmark simply finds that whichever of the two approaches you run first is slower than the one you run second. If I run your code, I can reproduce count() coming out an order magnitude slower than size... but if I reverse the order of the tests and try testing size first, then suddenly size is an order of magnitude slower than count(). Such is the danger of this kind of naive microbenchmarking.

As Adam Millerchip notes, in cases where the compiler is smart enough to notice that an object you call count() on is a Collection, the call should get inlined to size and so (I think?) compile down to completely identical bytecode, thanks to this implementation of Collection<T>.count() in the Kotlin standard library:

@kotlin.internal.InlineOnly
public inline fun <T> Collection<T>.count(): Int {
    return size
}

Certainly there's no way count() should ever come out faster than size, since count() just defers to size under the hood. At worst, the Iterable<T>.count() method will run, which checks if this is a Collection and defers to its size method if so (and thus must be slower since it's doing strictly more than a call to size). At best, the collection-specific implementation of count() will be inlined and you'll end up with identical bytecode as if you had written size. There's no way you end up faster than size, though.

Regardless, the quote from Kotlin in Action on Reddit was not claiming that the count() method is faster than the size method. Rather, it is saying that the Iterable<T>.count(predicate: (T) -> Boolean) method is more efficient (in execution time and peak memory use) than doing a filter call and checking size on the result, since the latter requires the construction of an additional List to hold the filtered results. That is, the optimal way to write the example in your question would have been this...

nums.count {e -> e > 0}

... and not either of the versions that you tried to benchmark.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459