7

VectorBuilder is defined in the same source file as Vector. Vector is immutable and in the scala.collections.immutable package, so as a consequence the builder is in the same package.

As far as I can tell, CanBuildFrom uses a VectorBuilder as the default, if the return type is not explicitly typed.

  • Is there a reason for not having the builder in a separate file in the mutable package?
  • Is the builder not meant to be used directly? If so, which builder or buffer is to be used to create a Seq?
Beryllium
  • 12,808
  • 10
  • 56
  • 86

2 Answers2

3

VectorBuilder is not meant to be used directly. If you want to get a builder for a Vector, you only need to call Vector.newBuilder[T], which returns a Builder[T, Vector[T]] (with the underlying instance being a VectorBuilder).

So if you want the default builder that would be used to create a Seq, you only need to call Seq.newBuilder:

scala> Seq(1,2,3)
res0: Seq[Int] = List(1, 2, 3)

scala> Seq.newBuilder[Int]
res1: scala.collection.mutable.Builder[Int,Seq[Int]] = ListBuffer()

scala> Seq.newBuilder[Int].result
res2: Seq[Int] = List()

The above shows that the default implementation of Seq is list, and, logically, the default builder for a Seq is actually a mutable.ListBuffer.

ListBuffer is more than just a List builder, that's why it is in collection.mutable whereas VectorBuilder is not a Buffer, it cannot be used for anything else other than build a Vector. That's probably why it is defined locally in Vector. I am not sure why it isn't private, I cannot see it referenced anywhere in the public API of Vector itself. Maybe it should be (private).


Just for reference, there is no hidden magic happening with CanBuildFrom, it almost always just goes through the newBuilder above:

  1. When you do not specify the expected collection type, as in Seq(1,2).map(_+1), the only available CanBuildFrom comes from the companion object a Seq, and is of type CanBuildFrom[Seq[_], T, Seq[T]]. That means the result will be a Seq too.

  2. Like most companion objects of collections, the CanBuildFrom instance Seq provides only does one thing: call Seq.newBuilder (that's defined in GenTraversableFactory ...)

That's why CanBuildFrom for Vector uses a VectorBuilder. For example, in this:

scala> Vector(1,2,3).map(_+1)
res12: scala.collection.immutable.Vector[Int] = Vector(2, 3, 4)

The builder that was used is:

scala> implicitly[CanBuildFrom[Vector[Int], Int, Vector[Int]]].apply()
res13: scala.collection.mutable.Builder[Int,Vector[Int]] = 
  scala.collection.immutable.VectorBuilder@43efdf93
gourlaysama
  • 11,240
  • 3
  • 44
  • 51
  • OK, there is a `newBuilder` method, and there is a difference between `Builder` and `Buffer. As there is another fine answer, I'll leave it open for some time; currently I cannot really say which one is better, thanks for your insight. But you've raised another question: Why is `Seq.newBuilder` not returning a `VectorBuilder`? It's not even returning a `Builder`, and it's even of a more sophisticated collection? But that feels like another question, so if you agree, I'd rather post another one. – Beryllium Jul 16 '13 at 13:41
  • To me, it sounds like you're suggesting that people use Seq.newBuilder, but does anyone do that? If I needed to build up a Map bit by bit, I think I'd use a mutable.Map and then .toMap, not MapBuilder. Rarely, I'll use CanBuildFrom in a special case. But maybe that's your point, you're taking the point of view of someone hooking into collections. – som-snytt Jul 16 '13 at 15:36
  • Someone was asking about MVx patterns, and I looked at my first scala code and found my newbie confusion over builder: https://github.com/som-snytt/House-of-Mirrors-Fork/blob/act/src/main/scala/hom/LightBox.scala#L356 I hope you find that as amusing as I do. – som-snytt Jul 17 '13 at 10:48
1

There is no mutable Vector. By analogy to ListSetBuilder, the builder stays close to what it is building.

mutable.StringBuilder is anomalous because it is a frequently used type that is more first-class, and tends to get passed through API. (It receives mention in the collections overview and doesn't require an import. In fact, it plays a role in this puzzler because it is subtly different from the Java StringBuilder but you might forget that you're using it.)

VectorBuilder is public API. The newBuilder on companion objects has more to do with the collection factory framework and CanBuildFrom. Neither is mentioned in the overview where it explains how to make things. I don't recall any snippets that suggest:

(ListBuffer.newBuilder[String] += "hello" += "world").result.toList

How you create a Vector depends on what you're doing, but the overview suggests the factory methods on the companion. If you already have a collection (maybe non-strict), then toVector. How do you make any collection of anything?

This is a good Q&A on choosing Vector over List for your Seq.

As an implementation detail, that VectorPointer thing you see in the scaladoc includes a big comment:

// USED BY BUILDER

Since VectorPointer is private[immutable], builder must stay there as a practical matter. That also suggests some thought was given as to what to expose as API.

In addition, Vector bears a similar comment:

// in principle, most members should be private. however, access privileges must
// be carefully chosen to not prevent method inlining

These comments suggest that access levels are not haphazard.

This question is an interesting opportunity to think about the design of the package. It may be that Scala collections are a Rorschach inkblot test, revealing great beauty and symmetry, while at the same time, to the disturbed mind, a great imbalance in the universe. (Kind of kidding there.)

Community
  • 1
  • 1
som-snytt
  • 39,429
  • 2
  • 47
  • 129
  • So the main points are: The `Builder` is close to what its building, one should use `Vector.newBuilder`, and access privileges are possibly chosen so that they do not prevent inlining. As there is another fine answer, I'll leave it open for some time; currently I cannot really say which one is better, thanks for your insight. – Beryllium Jul 16 '13 at 13:40
  • No, I buried my main point in the middle: you don't normally use a builder (see link to overview). If you need one (in a performance sensitive function), it doesn't much matter which way you obtain it: implicit CanBuildFrom, explicit by new or newBuilder. An interesting question, it turns out. – som-snytt Jul 16 '13 at 15:23