abstracting over a collection

Question

Recently, I wrote an iterator for a cartesian product of Anys, and started with a List of List, but recognized, that I can easily switch to the more abstract trait Seq.

I know, you like to see the code. :)

class Cartesian (val ll: Seq[Seq[_]]) extends Iterator [Seq[_]] {

  def combicount: Int = (1 /: ll) (_ * _.length)

  val last = combicount
  var iter = 0

  override def hasNext (): Boolean = iter < last
  override def next (): Seq[_] = {
    val res = combination (ll, iter)
    iter += 1
    res
  }

  def combination (xx: Seq [Seq[_]], i: Int): List[_] = xx match {
      case Nil     => Nil
      case x :: xs => x (i % x.length) :: combination (xs, i / x.length) 
  }
}

And a client of that class:

object Main extends Application {
  val illi = new Cartesian (List ("abc".toList, "xy".toList, "AB".toList))
  // val ivvi = new Cartesian (Vector (Vector (1, 2, 3), Vector (10, 20)))
  val issi = new Cartesian (Seq (Seq (1, 2, 3), Seq (10, 20)))
  // val iaai = new Cartesian (Array (Array (1, 2, 3), Array (10, 20)))

  (0 to 5).foreach (dummy => println (illi.next ()))
  // (0 to 5).foreach (dummy => println (issi.next ()))
}
/*
List(a, x, A)
List(b, x, A)
List(c, x, A)
List(a, y, A)
List(b, y, A)
List(c, y, A)
*/

The code works well for Seq and Lists (which are Seqs), but of course not for Arrays or Vector, which aren't of type Seq, and don't have a cons-method '::'.

But the logic could be used for such collections too.

I could try to write an implicit conversion to and from Seq for Vector, Array, and such, or try to write an own, similar implementation, or write an Wrapper, which transforms the collection to a Seq of Seq, and calls 'hasNext' and 'next' for the inner collection, and converts the result to an Array, Vector or whatever. (I tried to implement such workarounds, but I have to recognize: it's not that easy. For a real world problem I would probably rewrite the Iterator independently.)

However, the whole thing get's a bit out of control if I have to deal with Arrays of Lists or Lists of Arrays and other mixed cases.

What would be the most elegant way to write the algorithm in the broadest, possible way?

You'll probably want to rethink your approach and start from scratch following Rex Kerr's excellent advice in [that question](http://stackoverflow.com/questions/5410846/how-do-i-apply-the-pimp-my-library-pattern-to-scala-collections). — Jean-Philippe Pellet, May 26 '11 at 14:34
Thank you for the link. I hope I find time to read it in depth and to try it all out over the weekend, or maybe today. Looks promising. — user unknown, May 27 '11 at 12:19

score 2 · Answer 1 · edited Aug 24 '11 at 04:18

There are two solutions. The first is to not require the containers to be a subclass of some generic super class, but to be convertible to one (by using implicit function arguments). If the container is already a subclass of the required type, there's a predefined identity conversion which only returns it.

import collection.mutable.Builder
import collection.TraversableLike
import collection.generic.CanBuildFrom
import collection.mutable.SeqLike

class Cartesian[T, ST[T], TT[S]](val ll: TT[ST[T]])(implicit cbf: CanBuildFrom[Nothing, T, ST[T]], seqLike: ST[T] =>  SeqLike[T, ST[T]], traversableLike: TT[ST[T]] => TraversableLike[ST[T], TT[ST[T]]] ) extends Iterator[ST[T]] {

  def combicount (): Int = (1 /: ll) (_ * _.length)

  val last = combicount - 1 
  var iter = 0

  override def hasNext (): Boolean = iter < last
  override def next (): ST[T] = {
    val res = combination (ll, iter, cbf())
    iter += 1
    res
  }

  def combination (xx: TT[ST[T]], i: Int, builder: Builder[T, ST[T]]): ST[T] = 
    if (xx.isEmpty) builder.result
    else  combination (xx.tail, i / xx.head.length, builder += xx.head (i % xx.head.length) ) 
}

This sort of works:

scala> new Cartesian[String, Vector, Vector](Vector(Vector("a"), Vector("xy"), Vector("AB")))
res0: Cartesian[String,Vector,Vector] = empty iterator

scala> new Cartesian[String, Array, Array](Array(Array("a"), Array("xy"), Array("AB")))
res1: Cartesian[String,Array,Array] = empty iterator

I needed to explicitly pass the types because of bug https://issues.scala-lang.org/browse/SI-3343

One thing to note is that this is better than using existential types, because calling next on the iterator returns the right type, and not Seq[Any].

There are several drawbacks here:

If the container is not a subclass of the required type, it is converted to one, which costs in performance
The algorithm is not completely generic. We need types to be converted to SeqLike or TraversableLike only to use a subset of functionality these types offer. So making a conversion function can be tricky.
What if some capabilities can be interpreted differently in different contexts? For example, a rectangle has two 'length' properties (width and height)

Now for the alternative solution. We note that we don't actually care about the types of collections, just their capabilities:

TT should have foldLeft, get(i: Int) (to get head/tail)
ST should have length, get(i: Int) and a Builder

So we can encode these:

trait HasGet[T, CC[_]]  {
  def get(cc: CC[T], i: Int): T
}

object HasGet {
  implicit def seqLikeHasGet[T, CC[X] <: SeqLike[X, _]] = new HasGet[T, CC] {
    def get(cc: CC[T], i: Int): T = cc(i)
  }

  implicit def arrayHasGet[T] = new HasGet[T, Array] {
    def get(cc: Array[T], i: Int): T = cc(i)
  }
}

trait HasLength[CC] {
  def length(cc: CC): Int 
}

object HasLength {
  implicit def seqLikeHasLength[CC <: SeqLike[_, _]] = new HasLength[CC] {
    def length(cc: CC) = cc.length
  }

  implicit def arrayHasLength[T] = new HasLength[Array[T]] {
    def length(cc: Array[T]) = cc.length
  }

}   

trait HasFold[T, CC[_]] {
  def foldLeft[A](cc: CC[T], zero: A)(op: (A, T) => A): A
}

object HasFold {
  implicit def seqLikeHasFold[T, CC[X] <: SeqLike[X, _]] = new HasFold[T, CC] {
    def foldLeft[A](cc: CC[T], zero: A)(op: (A, T) => A): A = cc.foldLeft(zero)(op)
  }
  implicit def arrayHasFold[T] = new HasFold[T, Array] {
    def foldLeft[A](cc: Array[T], zero: A)(op: (A, T) => A): A =  {
      var i = 0
      var result = zero
      while (i < cc.length) {
        result = op(result, cc(i))
        i += 1
      }
      result
    }
  }
}

(strictly speaking, HasFold is not required since its implementation is in terms of length and get, but i added it here so the algorithm will translate more cleanly)

now the algorithm is:

class Cartesian[T, ST[_], TT[Y]](val ll: TT[ST[T]])(implicit cbf: CanBuildFrom[Nothing, T, ST[T]], stHasLength: HasLength[ST[T]], stHasGet: HasGet[T, ST], ttHasFold: HasFold[ST[T], TT], ttHasGet: HasGet[ST[T], TT], ttHasLength: HasLength[TT[ST[T]]]) extends Iterator[ST[T]] {

  def combicount (): Int = ttHasFold.foldLeft(ll, 1)((a,l) => a * stHasLength.length(l))

  val last = combicount - 1 
  var iter = 0

  override def hasNext (): Boolean = iter < last
  override def next (): ST[T] = {
    val res = combination (ll, 0, iter, cbf())
    iter += 1
    res
  }

  def combination (xx: TT[ST[T]], j: Int,  i: Int, builder: Builder[T, ST[T]]): ST[T] = 
    if (ttHasLength.length(xx) == j) builder.result
    else  {
      val head = ttHasGet.get(xx, j)
      val headLength = stHasLength.length(head)
      combination (xx, j + 1, i / headLength, builder += stHasGet.get(head, (i % headLength) )) 
    }
}

And use:

scala> new Cartesian[String, Vector, List](List(Vector("a"), Vector("xy"), Vector("AB")))
res6: Cartesian[String,Vector,List] = empty iterator

scala> new Cartesian[String, Array, Array](Array(Array("a"), Array("xy"), Array("AB")))
res7: Cartesian[String,Array,Array] = empty iterator

Scalaz probably has all of this predefined for you, unfortunately, I don't know it well.

(again I need to pass the types because inference doesn't infer the right kind)

The benefit is that the algorithm is now completely generic and that there is no need for implicit conversions from Array to WrappedArray in order for it to work

Excercise: define for tuples ;-)

I have to apologize for answering that late to such a substantial post, but when you answered, I was still sitting on scala-2.8, and couldn't migrate to 2.9. Now I migrated, and I hadn't forgotten to test your code, and to try to understand it. Unfortunately there are two or 3 problems, I have to mention. Problem 1 is, that the first code doesn't compile (any more?). (I allow myself, to insert the necessary imports into your code.) I'm using 2.9.0.1, and get the error...: — user unknown, Aug 24 '11 at 04:01
The second code compiles fine, but the code to test it doesn't: `GenericCartesian.scala:118: could not find implicit value for parameter stHasLength: HasLength[Vector[String]] new Cartesian [String, Vector, List] (List (Vector ("a"), Vector ("xy"), Vector ("AB")))` (errormark below 'new'). When I tried to find out, how to tell 'Cartesian', what the length is, I mentioned, that you submit a Vector of 3 Vectors of 1 String each. My example used a List of List of Characters of different lengths; I'm not sure what your code is supposed to produce, I guess a single Element ("a", "xy", "AB"). — user unknown, Aug 24 '11 at 04:31

abstracting over a collection

1 Answers1

Linked