43

I've searched for the use of @specialized in the source code of the standard library of Scala 2.8.1. It looks like only a handful of traits and classes use this annotation: Function0, Function1, Function2, Tuple1, Tuple2, Product1, Product2, AbstractFunction0, AbstractFunction1, AbstractFunction2.

None of the collection classes are @specialized. Why not? Would this generate too many classes?

This means that using collection classes with primitive types is very inefficient, because there will be a lot of unnecessary boxing and unboxing going on.

What's the most efficient way to have an immutable list or sequence (with IndexedSeq characteristics) of Ints, avoiding boxing and unboxing?

Charles
  • 50,943
  • 13
  • 104
  • 142
Jesper
  • 202,709
  • 46
  • 318
  • 350
  • 1
    Wow, cheez, i assumed that `List` and `IndexedSeq` are specialized. Doing some `:javap -c` with the new scala 2.9 REPL shows they still do boxing all the time. The constructors are optimized though it seems (`List(1,2,3)` uses a `wrapIntArray`). – 0__ Mar 29 '11 at 23:48
  • That's really unwanted overhead when dealing with huge datasets of numbers. – Raphael Mar 31 '11 at 07:20

3 Answers3

18

Specialization has a high cost on the size of classes, so it must be added with careful consideration. In the particular case of collections, I imagine the impact will be huge.

Still, it is an on-going effort -- Scala library has barely started to be specialized.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • 8
    This is correct. Video from Scala Days 2010 on @Specialized, where Iulian Dragos explains why it's not everywhere: http://days2010.scala-lang.org/node/138/151 – Matthew Farwell Mar 30 '11 at 07:53
  • 2
    I've also grepped through the source of the Scala 2.9 RC1 library and unfortunately it looks like nothing has changed yet! – Jesper Mar 30 '11 at 17:29
  • 1
    @Daniel `Specialization has a high cost on the size of classes, so it must be added with careful consideration` As a naive, non-compiler engineer, why does the size of classes matter? I am **not** making an argument, but looking to learn why. – Kevin Meredith Apr 21 '15 at 15:46
  • 4
    @KevinMeredith Well, for one thing, up to Java 7, it was [limited to 64 KB](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.9.1). It can also present problems when coding for Android. – Daniel C. Sobral Apr 21 '15 at 19:46
17

Specialized can be expensive ( exponential ) in both size of classes and compile time. Its not just the size like the accepted answer says.

Open your scala REPL and type this.

import scala.{specialized => sp}
trait S1[@sp A, @sp B, @sp C, @sp D] { def f(p1:A): Unit }

Sorry :-). Its like a compiler bomb.

Now, lets take a simple trait

trait Foo[Int]{ }

The above will result in two compiled classes. Foo, the pure interface and Foo$1, the class implementation.

Now,

trait Foo[@specialized A] { }

A specialized template parameter here gets expanded/rewritten for 9 different primitive types ( void, boolean, byte, char, int, long, short, double, float ). So, basically you end up with 20 classes instead of 2.

Going back to the trait with 5 specialized template parameters, the classes get generated for every combination of possible primitive types. i.e its exponential in complexity.

2 * 10 ^ (no of specialized parameters)

If you are defining a class for a specific primitive type, you should be more explicit about it such as

trait Foo[@specialized(Int) A, @specialized(Int,Double) B] { }

Understandably one has to be frugal using specialized when building general purpose libraries.

Here is Paul Phillips ranting about it.

smartnut007
  • 6,324
  • 6
  • 45
  • 52
  • 3
    But you don't always need to specialze for all eight primitive types plus `void`; you can also just specialize for `Int`, for example, using `@specialized(Int)`. – Jesper Mar 08 '13 at 19:21
  • yes, thats right. Added more notes to my answer. But, thats the case when you are doing something specific to a data type. Building general purpose libraries like the the scala language api needs to be more general purpose. – smartnut007 Mar 08 '13 at 20:54
6

Partial answer to my own question: I can wrap an array in an IndexedSeq like this:

import scala.collection.immutable.IndexedSeq

def arrayToIndexedSeq[@specialized(Int) T](array: Array[T]): IndexedSeq[T] = new IndexedSeq[T] {
  def apply(idx: Int): T = array(idx)
  def length: Int = array.length
}

(Ofcourse you could still modify the contents if you have access to the underlying array, but I would make sure that the array isn't passed to other parts of my program).

Jesper
  • 202,709
  • 46
  • 318
  • 350
  • 3
    Since you are hiding behind IndexedSeq that still doesn't convince the compiler not to use autoboxing. For instance, if i run the `:javap -c` on the following: `object Test { val a = arrayToIndexedSeq( Array(1,2,3)); val i: Int = a(0)}` -- you will see that val i = a(0) _does_ boxing (in `SeqLike`).... – 0__ Mar 29 '11 at 23:53
  • @Sciss Ok... grrr! Is there another way to get rid of the unnecessary autoboxing? Is directly using mutable arrays the only way to avoid it? – Jesper Mar 30 '11 at 06:38
  • 2
    I don't know. I guess we need to make a petition to add more specializations to the collections. Martin Odersky kind of promised that here: http://www.scala-lang.org/node/7285 ; the problem i guess is due to the heavy factoring out of traits in the new collections that if you want Vector to be specialized, you will end up having to specialize a dozen involved traits, which in turn creates high bloat on the size of the library. Still, I think we should vote for that, it definitely sucks to have weak performance for Vector which is really the number crunching swiss army knife in scala. – 0__ Mar 30 '11 at 13:29
  • @Sciss I've carefully looked at the output of `scalac -print ...` and following the sequence of methods being called, it looks like it actually does avoid boxing when I call `apply()` on it. (Using scala 2.8.1). There are some calls to `scala.Int.box()` but those are not called for that code path. – Jesper Mar 31 '11 at 21:34
  • Your collection however does have one advantage: it does not box stored elements, and that's a huge memory saving if the collection is (moderately) large. An integer is 4 bytes, boxing will probably add at least other 16 bytes to that (for object headers and the pointers to objects). When you extract one int, only that is boxed. – Blaisorblade Jul 19 '11 at 10:21