How do I write a custom set of integers in Scala? Specifically I want a class with the following properties:
- It is immutable.
- It extends the
Set
trait. - All collection operations return another object of this type as appropriate.
- It takes a variable list of integer arguments in its constructor.
- Its string representation is a comma-delimited list of elements surrounded by curly braces.
- It defines a method
mean
that returns the mean value of the elements.
For example:
CustomIntSet(1,2,3) & CustomIntSet(2,3,4) // returns CustomIntSet(2, 3)
CustomIntSet(1,2,3).toString // returns {1, 2, 3}
CustomIntSet(2,3).mean // returns 2.5
(1) and (2) ensure that this object does things in the proper Scala way. (3) requires that the builder code be written correctly. (4) ensures that the constructor can be customized. (5) is an example of how to override an existing toString
implementation. (6) is an example of how to add new functionality.
This should be done with a minimum of source code and boilerplate, utilizing functionality already present in the Scala language as much as possible.
I've asked a couple questions getting at aspects of the tasks, but I think this one covers the whole issue. The best response I've gotten so far is to use SetProxy
, which is helpful but fails (3) above. I've studied the chapter "The Architecture of Scala Collections" in the second edition of Programming in Scala extensively and consulted various online examples, but remain flummoxed.
My goal in doing this is to write a blog post comparing the design tradeoffs in the way Scala and Java approach this problem, but before I do that I have to actually write the Scala code. I didn't think this would be that difficult, but it has been, and I'm admitting defeat.
After futzing around for a few days I came up with the following solution.
package example
import scala.collection.{SetLike, mutable}
import scala.collection.immutable.HashSet
import scala.collection.generic.CanBuildFrom
case class CustomSet(self: Set[Int] = new HashSet[Int].empty) extends Set[Int] with SetLike[Int, CustomSet] {
lazy val mean: Float = sum / size
override def toString() = mkString("{", ",", "}")
protected[this] override def newBuilder = CustomSet.newBuilder
override def empty = CustomSet.empty
def contains(elem: Int) = self.contains(elem)
def +(elem: Int) = CustomSet(self + elem)
def -(elem: Int) = CustomSet(self - elem)
def iterator = self.iterator
}
object CustomSet {
def apply(values: Int*): CustomSet = new CustomSet ++ values
def empty = new CustomSet
def newBuilder: mutable.Builder[Int, CustomSet] = new mutable.SetBuilder[Int, CustomSet](empty)
implicit def canBuildFrom: CanBuildFrom[CustomSet, Int, CustomSet] = new CanBuildFrom[CustomSet, Int, CustomSet] {
def apply(from: CustomSet) = newBuilder
def apply() = newBuilder
}
def main(args: Array[String]) {
val s = CustomSet(2, 3, 5, 7) & CustomSet(5, 7, 11, 13)
println(s + " has mean " + s.mean)
}
}
This appears to meet all the criteria above, but it's got an awful lot of boilerplate. I find the following Java version much easier to understand.
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
public class CustomSet extends HashSet<Integer> {
public CustomSet(Integer... elements) {
Collections.addAll(this, elements);
}
public float mean() {
int s = 0;
for (int i : this)
s += i;
return (float) s / size();
}
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
for (Iterator<Integer> i = iterator(); i.hasNext(); ) {
sb.append(i.next());
if (i.hasNext())
sb.append(", ");
}
return "{" + sb + "}";
}
public static void main(String[] args) {
CustomSet s1 = new CustomSet(2, 3, 5, 7, 11);
CustomSet s2 = new CustomSet(5, 7, 11, 13, 17);
s1.retainAll(s2);
System.out.println("The intersection " + s1 + " has mean " + s1.mean());
}
}
This is bad since one of Scala's selling points is that it is more terse and clean than Java.
There's a lot of opaque code in the Scala version. SetLike
, newBuilder
, and canBuildFrom
are all language boilerplate: they have nothing to do with writing sets with curly braces or taking a mean. I can almost accept them as the price you pay for Scala's immutable collection class library (for the moment accepting immutability as an unqualified good), but that leaves still leaves contains
, +
, -
, and iterator
which are just boilerplate passthrough code.
They are at least as ugly as getter and setter functions.
It seems like Scala should provide a way not to write Set
interface boilerplate, but I can't figure it out. I tried both using SetProxy
and extending the concrete HashSet
class instead of the abstract Set
but both of these gave perplexing compiler errors.
Is there a way to write this code without the contains
, +
, -
, and iterator
definitions, or is the above the best I can do?
Following the advice of axel22 below I wrote a simple implementation that utilizes the extremely useful if unfortunately-named pimp my library pattern.
package example
class CustomSet(s: Set[Int]) {
lazy val mean: Float = s.sum / s.size
}
object CustomSet {
implicit def setToCustomSet(s: Set[Int]) = new CustomSet(s)
}
With this you just instantiate Set
s instead of CustomSet
s and do implicit conversion as needed to take the mean.
scala> (Set(1,2,3) & Set(2,3,5)).mean
res4: Float = 2.0
This satisfies most of my original wish list but still fails item (5).
Something axel22 said in the comments below gets at the heart of why I am asking this question.
As for inheritance, immutable (collection) classes are not easy to inherit in general…
This squares with my experience, but from a language design perspective something seems wrong here. Scala is an object oriented language. (When I saw Martin Odersky give a talk last year, that was the selling point over Haskell he highlighted.) Immutability is the explicitly preferred mode of operation. Scala's collection classes are touted as the crown jewel of its object library. And yet when you want to extend an immutable collection class, you run into all this informal lore to the effect of "don't do that" or "don't try it unless you really know what you're doing." Usually the point of classes is to make them easily extendible. (After all, the collection classes are not marked final
.) I'm trying to decide whether this is a design flaw in Scala or a design trade-off that I'm just not seeing.