2

If we want to define a case class that holds a single object, say a tuple, we can do it easily:

sealed case class A(x: (Int, Int))

In this case, retrieving the "x" value will take a small constant amount of time, and this class will only take a small constant amount of space, regardless of how it was created.

Now, let's assume we want to hold a sequence of values instead; we could it like this:

sealed final case class A(x: Seq[Int])

This might seem to work as before, except that now storage and time to read all of x is proportional to x.length.

However, this is not actually the case, because someone could do something like this:

val hugeList = (1 to 1000000000).toList
val a = A(hugeList.view.filter(_ == 500000000))

In this case, the a object looks like an innocent case class holding a single int in a sequence, but in fact it requires gigabytes of memory, and it will take on the order of seconds to access that single element every time.

This could be fixed by specifying something like List[T] as the type instead of Seq[T]; however, this seems ugly since it adds a reference to a specific implementation, while in fact other well behaved implementations, like Vector[T], would also do.

Another worrying issue is that one could pass a mutable Seq[T], so it seems that one should at least use immutable.Seq instead of scala.collection.Seq (although the compiler can't actually enforce the immutability at the moment).

Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?

Or perhaps Seq is being used just because it's the shortest to type, and in fact it would be best to use immutable.Seq[T], List[T], Vector[T] or something else?

New text added in edit

Looking at the class library, some of the most core functionality like scala.reflect.api.Trees does in fact use List[T], and in general using a concrete class seems a good idea.

But then, why use List and not Vector?

Vector has O(1)/O(log(n)) length, prepend, append and random access, is asymptotically smaller (List is ~3-4 times bigger due to vtable and next pointers), and supports cache efficient and parallelized computation, while List has none of those properties except O(1) prepend.

So, personally I'm leaning towards Vector[T] being the correct choice for something exposed in a library data structure, where one doesn't know what operations the library user will need, despite the fact that it seems less popular.

  • Note that you should not qualify a case class as `sealed`. Sealed means that you are going to have a defined number of sub classes in that compile unit, but actually you should not sub class case classes at all. If you want to be sure, use `final`. I personally hope that in a future version of Scala, case class will automatically be final, because that extra keyword all over the place is pure clutter. – 0__ Mar 29 '13 at 22:24
  • There are various entries on SO regarding choice of collections. There is also discussion on mailing lists whether Vector should be generally preferred to List. List is still a good choice if you do not need random access and size, because it is an extremely light weight structure, and it goes well with Lisp still recursive pattern matching destructing (`case head :: tail` vs. `case Nil`). See for example [this entry](http://stackoverflow.com/questions/6928327/when-should-i-choose-vector-in-scala) – 0__ Mar 29 '13 at 23:07

1 Answers1

1

First of all, you talk both about space and time requirements. In terms of space, your object will always be as large as the collection. It doesn't matter whether you wrap a mutable or immutable collection, that collection for obvious reasons needs to be in memory, and the case class wrapping it doesn't take any additional space (except its own small object reference). So if your collection takes "gigabytes of memory", that's a problem of your collection, not whether you wrap it in a case class or not.

You then go on to argue that a problem arises when using views instead of eager collections. But again the question is what the problem actually is? You use the example of lazily filtering a collection. In general running a filter will be an O(n) operation just as if you were iterating over the original list. In that example it would be O(1) for successive calls if that collection was made strict. But that's a problem of the calling site of your case class, not the definition of your case class.

The only valid point I see is with respect to mutable collections. Given the defining semantics of case classes, you should really only use effectively immutable objects as arguments, so either pure immutable collections or collections to which no instance has any more write access.

There is a design error in Scala in that scala.Seq is not aliased to collection.immutable.Seq but a general seq which can be either mutable or immutable. I advise against any use of unqualified Seq. It is really wrong and should be rectified in the Scala standard library. Use collection.immutable.Seq instead, or if the collection doesn't need to be ordered, collection.immutable.Traversable.

So I agree with your suspicion:

Looking at most libraries it seems that the common pattern is to use scala.collection.Seq[T], but is this really a good idea?

No! Not good. It might be convenient, because you can pass in an Array for example without explicit conversion, but I think a cleaner design is to require immutability.

0__
  • 66,707
  • 21
  • 171
  • 266