27

Here is my benchmark code:

def bm(duration: Long)(f: => Unit)={
  val end = System.currentTimeMillis + duration
  var count = 0
  while(System.currentTimeMillis < end) { f; count += 1 }
  count
}

val array = new scala.util.Random().alphanumeric.take(1000).toArray

(1 to 20).map { _ => bm(1000) { array.slice(100,200) } }.sum / 20

Running this several times, I consistently get numbers in the ballpark of about 1.5 million slices per second. Between 1.4 and 1.6.

Now, I do this:

 implicit class FastSlicing(val a: Array[Char]) extends AnyVal {
   def fastSlice(from: Int, until: Int)  = Arrays.copyOfRange(a, from, until)
 }
 (1 to 20).map { _ => bm(1000) { array.fastSlice(100,200) } }.sum / 20

And the result I get is between 16 and 18 million of slices per second. This is more than 10 times faster.

Now, I know all the usual reasoning about the trade-offs that scala makes to provide functional idioms and type safety sometimes at the cost of performance ... But in this case, I think they all fail to answer a simple question: why is ArrayOps.slice not implemented this way??? I realize, there would be multiple identical implementations needed, because of the way java deals with primitive arrays, but that's at most a minor annoyance, not really a deal-breaker kind of problem to justify a 10x performance hit.

The .slice is only one example, most of other array ops seem to suffer from the same problem too. Why does it have to be this way?

Update now, here is something that I find even more shocking:

val seq = new scala.util.Random().alphanumeric.take(1000).toIndexedSeq
(1 to 20).map { _ => bm(1000) { seq.slice(100,200) } }.sum / 20

This does about 5-6 million slices per second for me. But this:

import scala.collections.JavaConversions._
(1 to 20).map { _ => bm(1000) { seq.subList(100,200) } }.sum / 20

does between 12 and 15 million! Granted, this is not order of magnitude difference, like in the arrays case, but (1) there is no special handling of primitives involved here, so this would be completely trivial to just implement using java standard tooling, and (2) the collection is immutable ... how hard can it be to return a reference to a range of indices???

Dima
  • 39,570
  • 6
  • 44
  • 70
  • 1
    Have you already seen the code behind them? – Vale Jun 22 '16 at 13:21
  • @Vale yes, I have. Have you read the question? ;) – Dima Jun 22 '16 at 13:22
  • 1
    I did, but cannot find the line where you say you did. I may be bad at reading comprehension. – Vale Jun 22 '16 at 13:28
  • @Vale, how about the line, that says "_why is ArrayOps.slice not implemented this way_???" It kinda implies that I know how it is implemented, and the very question I am asking is about why it is implemented so poorly, no? – Dima Jun 22 '16 at 13:32
  • I guess =) Here's the kick: I can't find where the source code is: I was intrigued by your question and went [to their documentation](http://www.scala-lang.org/api/2.10.3/#scala.collection.mutable.ArrayOps) I read about the method, but not so much info was available; [so I followed the github link](https://github.com/scala/scala/blob/v2.10.3/src/library/scala/collection/mutable/ArrayOps.scala#L1) and was expecting to find the slice() method definition, as it is listed in the concrete members, but it's not there. I admit I am a newbie in serious programming: please help me. – Vale Jun 22 '16 at 13:39
  • 4
    @Vale the implementation is here: https://github.com/scala/scala/blob/v2.11.5/src/library/scala/collection/IndexedSeqOptimized.scala#L110. I find the file name really ironic :) – Dima Jun 22 '16 at 13:41
  • I don't know... It all comes down to this: http://stackoverflow.com/questions/2589741/what-is-more-efficient-system-arraycopy-vs-arrays-copyof Since you are using the native arraycopy (which also works in batch, not element by element), I guess that's what makes the difference. Java.lang also uses it, I see, so I'd say Oracle had a longer sight than the Scala programmers. If you see then second answer, though, you could imagine that Scala expects a different base type to be copied. I can only guess, as you did. Interesting question, though. – Vale Jun 22 '16 at 13:54
  • one aspect to consider is the potential for "memory leaks" if slices were implemented as references. hanging on to a tiny slice may keep a gigantic superset in memory. i know this is why String.substring behaviour was changed in java at some point (to copy characters over instead of referencing the superstring) – radai Jun 22 '16 at 13:56
  • Scala uses a generic implementation for all `IndexedSeq`s, and it's slower because it goes through `Builder`, copies the elements one by one and checks the size on every step. It's of course possible to add efficient implementations by having `override def slice` in every `WrappedArray` but probably nobody bothered doing that for all the `WrappedArray`s and all the methods that can be delegated to java `Arrays` methods. – Kolmar Jun 22 '16 at 15:01
  • 6
    My guess is: because nobody felt like overriding `slice` in `ArrayOps` with a more efficient implementation. You can submit a pull request. – Jasper-M Jun 22 '16 at 15:03
  • @Jasper-M thanks for suggestion. I did submit a pull request. You are welcome to take a look and comment if you'd like: https://github.com/scala/scala/pull/5250 – Dima Jun 29 '16 at 18:58

1 Answers1

2

It has been fixed in scala 2.12.

Zang MingJie
  • 5,164
  • 1
  • 14
  • 27