1

How do I write a custom set of integers in Scala? Specifically I want a class with the following properties:

  1. It is immutable.
  2. It extends the Set trait.
  3. All collection operations return another object of this type as appropriate.
  4. It takes a variable list of integer arguments in its constructor.
  5. Its string representation is a comma-delimited list of elements surrounded by curly braces.
  6. It defines a method mean that returns the mean value of the elements.

For example:

CustomIntSet(1,2,3) & CustomIntSet(2,3,4) // returns CustomIntSet(2, 3)
CustomIntSet(1,2,3).toString // returns {1, 2, 3}
CustomIntSet(2,3).mean // returns 2.5

(1) and (2) ensure that this object does things in the proper Scala way. (3) requires that the builder code be written correctly. (4) ensures that the constructor can be customized. (5) is an example of how to override an existing toString implementation. (6) is an example of how to add new functionality.

This should be done with a minimum of source code and boilerplate, utilizing functionality already present in the Scala language as much as possible.

I've asked a couple questions getting at aspects of the tasks, but I think this one covers the whole issue. The best response I've gotten so far is to use SetProxy, which is helpful but fails (3) above. I've studied the chapter "The Architecture of Scala Collections" in the second edition of Programming in Scala extensively and consulted various online examples, but remain flummoxed.

My goal in doing this is to write a blog post comparing the design tradeoffs in the way Scala and Java approach this problem, but before I do that I have to actually write the Scala code. I didn't think this would be that difficult, but it has been, and I'm admitting defeat.


After futzing around for a few days I came up with the following solution.

package example

import scala.collection.{SetLike, mutable}
import scala.collection.immutable.HashSet
import scala.collection.generic.CanBuildFrom

case class CustomSet(self: Set[Int] = new HashSet[Int].empty) extends Set[Int] with SetLike[Int, CustomSet] {
  lazy val mean: Float = sum / size

  override def toString() = mkString("{", ",", "}")

  protected[this] override def newBuilder = CustomSet.newBuilder

  override def empty = CustomSet.empty

  def contains(elem: Int) = self.contains(elem)

  def +(elem: Int) = CustomSet(self + elem)

  def -(elem: Int) = CustomSet(self - elem)

  def iterator = self.iterator
}

object CustomSet {
  def apply(values: Int*): CustomSet = new CustomSet ++ values

  def empty = new CustomSet

  def newBuilder: mutable.Builder[Int, CustomSet] = new mutable.SetBuilder[Int, CustomSet](empty)

  implicit def canBuildFrom: CanBuildFrom[CustomSet, Int, CustomSet] = new CanBuildFrom[CustomSet, Int, CustomSet] {
    def apply(from: CustomSet) = newBuilder

    def apply() = newBuilder
  }

  def main(args: Array[String]) {
    val s = CustomSet(2, 3, 5, 7) & CustomSet(5, 7, 11, 13)
    println(s + " has mean " + s.mean)
  }
}

This appears to meet all the criteria above, but it's got an awful lot of boilerplate. I find the following Java version much easier to understand.

import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;

public class CustomSet extends HashSet<Integer> {
    public CustomSet(Integer... elements) {
        Collections.addAll(this, elements);
    }

    public float mean() {
        int s = 0;
        for (int i : this)
            s += i;
        return (float) s / size();
    }

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        for (Iterator<Integer> i = iterator(); i.hasNext(); ) {
            sb.append(i.next());
            if (i.hasNext())
                sb.append(", ");
        }
        return "{" + sb + "}";
    }

    public static void main(String[] args) {
        CustomSet s1 = new CustomSet(2, 3, 5, 7, 11);
        CustomSet s2 = new CustomSet(5, 7, 11, 13, 17);

        s1.retainAll(s2);

        System.out.println("The intersection " + s1 + " has mean " + s1.mean());
    }
}

This is bad since one of Scala's selling points is that it is more terse and clean than Java.

There's a lot of opaque code in the Scala version. SetLike, newBuilder, and canBuildFrom are all language boilerplate: they have nothing to do with writing sets with curly braces or taking a mean. I can almost accept them as the price you pay for Scala's immutable collection class library (for the moment accepting immutability as an unqualified good), but that leaves still leaves contains, +, -, and iterator which are just boilerplate passthrough code. They are at least as ugly as getter and setter functions.

It seems like Scala should provide a way not to write Set interface boilerplate, but I can't figure it out. I tried both using SetProxy and extending the concrete HashSet class instead of the abstract Set but both of these gave perplexing compiler errors.

Is there a way to write this code without the contains, +, -, and iterator definitions, or is the above the best I can do?


Following the advice of axel22 below I wrote a simple implementation that utilizes the extremely useful if unfortunately-named pimp my library pattern.

package example

class CustomSet(s: Set[Int]) {
  lazy val mean: Float = s.sum / s.size
}

object CustomSet {
  implicit def setToCustomSet(s: Set[Int]) = new CustomSet(s)
}

With this you just instantiate Sets instead of CustomSets and do implicit conversion as needed to take the mean.

scala> (Set(1,2,3) & Set(2,3,5)).mean
res4: Float = 2.0

This satisfies most of my original wish list but still fails item (5).


Something axel22 said in the comments below gets at the heart of why I am asking this question.

As for inheritance, immutable (collection) classes are not easy to inherit in general…

This squares with my experience, but from a language design perspective something seems wrong here. Scala is an object oriented language. (When I saw Martin Odersky give a talk last year, that was the selling point over Haskell he highlighted.) Immutability is the explicitly preferred mode of operation. Scala's collection classes are touted as the crown jewel of its object library. And yet when you want to extend an immutable collection class, you run into all this informal lore to the effect of "don't do that" or "don't try it unless you really know what you're doing." Usually the point of classes is to make them easily extendible. (After all, the collection classes are not marked final.) I'm trying to decide whether this is a design flaw in Scala or a design trade-off that I'm just not seeing.

Community
  • 1
  • 1
W.P. McNeill
  • 16,336
  • 12
  • 75
  • 111
  • possible duplicate of [Extend Scala Set with concrete type](http://stackoverflow.com/questions/4416885/extend-scala-set-with-concrete-type) – Suma Feb 04 '15 at 15:43

1 Answers1

2

Aside from the mean which you could add as an extension method using implicit classes and value classes, all the properties you list should be supported by the immutable.BitSet class in the standard library. Perhaps you could find some hints in that implementation, particularly for efficiency purposes.

You wrote a lot of code to achieve delegation above, but you could achieve a similar thing using class inheritance as in Java -- note that writing a delegating version of the custom set would require much more boilerplate in Java too.

Perhaps macros will in the future allow you to write code which generates the delegation boilerplate automatically -- until then, there is the old AutoProxy compiler plugin.

axel22
  • 32,045
  • 9
  • 125
  • 137
  • Thanks for the pointer to `BitSet`. That's a useful model for writing collection classes, but it implements `+` and `-` which is something I'm trying to avoid doing here because Scala already has an implementation of those operations for `Set`. – W.P. McNeill Apr 22 '13 at 18:37
  • Another way of phrasing the question I closed my original post with is "Can I do this with inheritance instead of delegation?" I haven't been able to figure out how. Implicit classes seem promising though. I hadn't heard of them before, but I'm going to upgrade to Scala 2.10 and take a look. – W.P. McNeill Apr 22 '13 at 18:39
  • You don't really have to use implicit classes and value classes -- the extension method pattern with an implicit conversion and a wrapper that has the method is enough (but you incur the overhead of object allocation). As for inheritance, immutable (collection) classes are not easy to inherit in general -- the problem is that they usually use more than a single implementation class, and you would need to inherit all of them and override all methods that create new objects. As for mutable, you could inherit `HashSet` and the `SetLike` interface, and override the `newCombiner`. Should work. – axel22 Apr 22 '13 at 18:44
  • Is "extension method pattern" the same as "pimp my library"? – W.P. McNeill Apr 22 '13 at 22:49
  • Yes, that's the same pattern. – axel22 Apr 23 '13 at 06:07