0

I'm trying to reduce execution time of some small scala routine, say, concatenation of strings, since I'm too lazy to setup local environment, I'm using online scala compilers, but found the comparison result differs between scastie and scalafiddle w/ the following code:

// routine 1
var startT1 = System.nanoTime()
(1 until 100 * 1000).foreach{ x=>
  val sb = new StringBuilder("a")
  sb.append("b").append("c").append("d").append("e").append("f")
}
println(System.nanoTime() - startT1)

// routine 2
var startT2 = System.nanoTime()
(1 until 100 * 1000).foreach{ x=>
  val arr = Array[Char]('a', 'b', 'c', 'd', 'e', 'f')
}
println(System.nanoTime() - startT2)

In scalafiddle routine 1 is faster but in scastie routine 2 is faster.
I'v read this article https://medium.com/@otto.chrons/what-makes-scalafiddle-so-fast-9a3edf33ed4d, so it seems that scalafiddle actually runs JavaScript instead of scala. But the remaining question is, can I really use scastie for execution time benchmarks?

Xintong Bian
  • 45
  • 1
  • 6
  • 1
    The answer is you can't use `nanoTime()` as a reliable benchmark tool. [Read here](https://stackoverflow.com/q/504103/4993128) for more. – jwvh Feb 23 '21 at 03:25
  • @jwvh isn't the link you provided says we SHOULD use nanoTime() instead of currentTimeMillis()? – Xintong Bian Feb 23 '21 at 03:49
  • 1
    Well, my point is that `nanoTime()` _by itself_ is insufficient. Just look at all the things that a **real** benchmark tool incorporates in order to control for JIT and JVM interference. – jwvh Feb 23 '21 at 04:13

1 Answers1

0

Short answer - NO I don't you can not rely on ANY online running tools like scastie and scalafiddle to verify performance.

Because, there is 1000 and more reasons why some benchmark will show X millis execution for some operation, and 99% of that reasons is running environment: hardware, operating system, CPU architecture, load on machine, used Scala compiler, used JVM etc. And we don't know environment change between runs on Scatie for instance, so you can get totally different numbers and don't know why, hence benchmark results won't be reliable.

If you would like to get some results, you would like rely one at least a bit, take a look at https://openjdk.java.net/projects/code-tools/jmh/ and it's sbt helper plugin https://github.com/ktoso/sbt-jmh and run known environment.

And along with posting benchmark results - please post environment details, where it was run.

Ivan Kurchenko
  • 4,043
  • 1
  • 11
  • 28
  • My point is more like to get a rule of thumb instead of rigorous benchmarks, and since my algo will be running on some heterogeneous hadoop cluster, which I don't know the exact environment, really I want to know what algo most likely runs faster. But, anyway, can we say that even a rule of thumb is impossible to get? – Xintong Bian Feb 23 '21 at 16:42
  • @XintongBian I'm afraid there is no 1 rule thumb or any other easy answer. Bench-marking, especially on low level, is a science by itself. One assumption I can give - try to bigger amount of data to minimize diff scale. For instance in your case of Array vs StringBuilder comparison - try go with string size e.g. 100k chars, so diff will be significant. For more information about jmh and benchmarking, you can see Aleksey Shipilev's presentations like - https://www.youtube.com/watch?v=SKPdqgD1I2U - he's one of jmh authors. – Ivan Kurchenko Feb 23 '21 at 16:48