12

I want an infinite non-strict series x1, x2, x3... that I can work with one element at a time, not memoizing the results in order to keep my memory usage constant. For the sake of specificity let's say it's a series of integers (e.g. the natural numbers, the odd numbers, the primes), though this question might apply to more general data types.

The easiest way to work with infinite lists is to use Scala's Stream object. A common idiom is to write a function that returns a Stream, uses the #:: operator to add a term to the series, and then calls itself recursively. For example, the following generates an infinite stream of integers given a starting value and a successor function.

  def infiniteList(n: Int, f: Int => Int): Stream[Int] = {
      n #:: infiniteList(f(n), f)
  }
  infiniteList(2, _*2+3).take(10) print
  // returns 2, 7, 17, 37, 77, 157, 317, 637, 1277, 2557, empty

(I realize that the above is equivalent to the library call Stream.iterate(2)(_*2+3). I wrote it here as example of this infinite Stream idiom.)

However, streams memoize their results, making their memory requirements non-constant, and potentially very large. The documentation states that memoization is avoided if you don't hold on to the head of a Stream, but in practice this can be tricky. I may implement infinite list code in which I don't think I'm holding on to any stream heads, but if it still has unbounded memory requirements I have to figure out if the problem is that I'm handling my streams in some way that does cause memoization, or if it is something else. This can be a difficult debugging task and has code smell because I'm trying to trick an explicitly memoized data structure into returning a non-memoized result.

What I want is something with the semantics of Stream expect without memoization. It doesn't appear that such an object exists in Scala. I've been experimenting with using iterators to implement infinite numeric series, but the mutability of iterators makes this tricky when you start wanting to do comprehension operations on them. I've also tried to write my own code from scratch, but it's not clear where I should start (do I subclass Traversable?) or how to avoid reimplementing the functionality in map, fold, etc.

Does someone have good example Scala code of an implementation of a non-strict, immutable, non-memoized infinite list?

More generally, I understand the semantics of traversable, iterable, sequence, stream, and view, but the fact that I find this question so vexing makes me feel like I'm misunderstanding something. It seems to me that non-strictness and non-memoization are completely orthogonal properties, but Scala seems to have made a design decision to unify them in Stream and given no easy way to peel them apart. Is this an oversight on Scala's part, or is there some deep connection between non-strictness and non-memoization that I am overlooking?


I realize the question is fairly abstract. Here is some additional context that ties it to a specific problem.

I am encountering this issue in the course of implementing a prime number generator as described in Meissa O'Niell's paper "The Genuine Sieve of Eratosthenes", and it's difficult to give a simple example of where an Iterator object is inadequate without pulling in a lot of the details from that paper. Here is a complete implementation using streams that is very elegant but has suspiciously large memory consumption.

Here is a simplified implementation with iterators that does not compile but gives you an idea of what I want.

import scala.collection.mutable

object ONeillSieve {

  class NumericSeries extends BufferedIterator[Int] with Ordered[NumericSeries] {
    def hasNext = true

    def compare(that: NumericSeries) = that.head.compare(head)

    override def toString() = head + "..."

    var head = 3

    def next() = {
      val r = head
      head += 2
      r
   }
 }

def main(args: Array[String]) {
    val q = mutable.PriorityQueue[NumericSeries]()

    val odds = new NumericSeries

    q += odds.map(odds.head * _)
    odds.next()
    q += odds.map(odds.head * _)

    println("Sieve = %s\nInput = %s".format(q, odds))
  }
}

I need to build a PriorityQueue of infinite numeric series keyed by their smallest element. (Hence I use a BufferedIterator instead of just a plain Iterator.) Also note that here the basis for infinite series is the odd integers but the most general solution involves more complicated series. At the end of the main function I want the queue to contain two infinite series:

  1. 3x(odds starting from 3) (i.e. 9,12,15...)
  2. 5x(odds starting from 5) (i.e. 25,30,35...)

The above does not compile because odds.map(...) returns an Iterator, not a NumericSeries and hence cannot be added to the priority queue.

At this point it looks like I'm wading into collections classes extensions, and that's tricky so I want to make sure I'm not having to do it unless absolutely necessary.

Community
  • 1
  • 1
W.P. McNeill
  • 16,336
  • 12
  • 75
  • 111
  • 2
    Can you please show an example of where `Iterator` falls on its face? I do this with `Iterator`, but you may need a couple of tricks to get it to work in all the cases you want. So if we can see an example of where the naive approach fails you, maybe we can help. – Rex Kerr Jan 12 '13 at 21:40
  • In my actual application (an implementation of Melissa O'Neill's sieve of Eratosthenes algorithm) I'm passing these series around and calling `map` and `tail` operations that eventually force me to write boilerplate code like copy constructors. I have a `Stream` version with very clean source, but it runs out of memory. – W.P. McNeill Jan 12 '13 at 21:44
  • Can you show an minimized example of the kind of boilerplate you want to avoid? Also, memoization of _some_ sort is the whole reason it's an efficient algorithm. – Rex Kerr Jan 12 '13 at 21:46
  • I'm trying to come up with a simplified example right now. It's tricky. In order to implement O'Niell's algorithm you need a infinite list of integers generated by a successor function, you need to be able to transform that series with `map(_ * x)` into multiple other series with different scaling factors that can all be advanced independently of each other. – W.P. McNeill Jan 12 '13 at 22:00

4 Answers4

3

EDIT: The below doesn't preserve the Generator type when using filter or map; indeed trying to implement a full 'MyType' for the generator is more or less impossible (look into IndexedSeqView source code to see the mess).

But there is an even easier way around (see my third answer)


Ok, my second try. In order to keep the lazy behaviour for map, filter, etc., the best might be to subclass SeqView or StreamView:

import collection.immutable.StreamView

final case class Generator[A](override val head: A, fun: A => A)
extends StreamView[A, Generator[A]] {
  protected def underlying = this
  def length: Int = Int.MaxValue  // ?
  def iterator = Iterator.iterate(head)(fun)
  def apply(idx: Int): A = {
    if(idx < 0) throw new IndexOutOfBoundsException(idx.toString)
    var res = head
    var i = idx; while(i > 0) {
      res = fun(res)
      i -= 1
    }
    res
  }
}

(I took Rex' suggestion to call this "Generator").

val i = Generator[Int](2, _ * 2 + 3)
i.take(4).foreach(println)
val j = i.map(_ * 0.5)
j.take(4).foreach(println)
0__
  • 66,707
  • 21
  • 171
  • 266
  • Do you want `Generator` to extend `SeqView` or `StreamView`? (You have the latter in the code snippet above?) – W.P. McNeill Jan 12 '13 at 22:58
  • I think you're onto a good start (+1), but you'll likely need to add quite a few more traits to avoid random methods throwing back `SeqView` (or `StreamView`) back at you when you want a `Generator`. – Rex Kerr Jan 12 '13 at 23:12
  • `StreamView` is a subclass of `SeqView`. Despite the name it doesn't memoise. It looked the most straightforward approach. The only thing the above doesn't do yet, is return a `Generator` when you call map, but a `StreamView`. – 0__ Jan 12 '13 at 23:12
  • I tried your `map`, `filter` to return a `Generator` object for the approach you outline here, and it's not clear what the builder semantics should be since it seems like builders want to be finite mutable objects (`ArrayBuffer` or whatnot) and my series is infinite by construction. – W.P. McNeill Jan 13 '13 at 03:33
1

If you merely need to be able to recurse your list several times, try working with Unit => Iterator[A] instead of the original, try this restructuring:

// Old way
val i = Iterator.tabulate(5)(_ + 2)
val j = i.map(_*5)
val k = i.map(_*3)
println(j.mkString(" "))  // Prints 10, 15, 20, 25, 30 as it should
println(k.mkString(" "))  // Prints nothing!  (i was used up!)

// New way
val f = (u: Unit) => Iterator.tabulate(5)(_+2)
val g = f andThen (_.map(_*5))
val h = f andThen (_.map(_*3))
println(g(()).mkString(" "))  // 10, 15, 20, 25, 30
println(h(()).mkString(" "))  // 6, 9, 12, 15, 18

But this does all work over again from the beginning. If you need to spawn off new work from the middle, there is also a way to do that, as long as you're willing to store all the intermediate elements between how far you've advanced:

val a = Iterator.tabulate(5)(_+2)
val (a1,a2) = a.duplicate
val c = a1.map(_*5)
val d = a2.map(_*3)
println(c.mkString(" "))  // 10, 15, 20, 25, 30...but stores a=2, 3, 4, 5, 6
println(d.mkString(" "))  // 6, 9, 12, 15, 18

If neither this nor the other pattern are good enough, then you will have to create a class in the collections library--let's call it Generator maybe?--that will do exactly what you want. I'd have it inherit from Iterator or Iterable, overriding or creating a duplicate method that will create two new copies with the internal generating function and data in the same state.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • It seems like I need to take a roll-your-own approach that inherits from `Iterator`. Extending collection classes is the one aspect of Scala I still find confusing. I never know where to start. – W.P. McNeill Jan 12 '13 at 23:15
  • @W.P.McNeill - I think you'll have to bite the bullet and start (with guidance!). I doubt you'll find that this works the way you want it to given your application (but I'll leave the answer here regardless for those who may have less demanding tasks). – Rex Kerr Jan 12 '13 at 23:24
1

This is hopefully the most straight forward approach. Just create a lazy Iterable:

object Generator {
  def apply[A](head: A)(next: A => A): Generator[A] = {
    val _head = head
    new collection.IterableView[A, Nothing] {
      override def head = _head
      def underlying = sys.error("No underlying structure")
      def iterator = Iterator.iterate(head)(next)
    }
  }
}
type Generator[A] = Iterable[A]

Here is the use case:

val q = collection.mutable.PriorityQueue[Generator[Int]]()
val odds = Generator(3)(_ + 2)
q += odds.map(odds.head * _)
val next = odds.tail
q += next.map(next.head * _)
q.last.take(3).mkString(",") // -> 9,12,21
q.head.take(3).mkString(",") // -> 25,35,45
0__
  • 66,707
  • 21
  • 171
  • 266
0

EDIT: I leave this answer here for reference, but I found that not to run into stack overflows, it's best to use a collection which is lazy by default: SeqView -> see my other answer.


If you want to define a new collection type, this is what I would imagine:

import collection.generic.{GenericTraversableTemplate, GenericCompanion}
import collection.immutable.LinearSeq

final case class InfSeq[A](override val head: A, fun: A => A)
extends LinearSeq[A] with GenericTraversableTemplate[A, List] {
  override def companion: GenericCompanion[List] = List

  def apply(idx: Int): A = {
    if(idx < 0) throw new IndexOutOfBoundsException(idx.toString)
    var res = head
    var i   = idx
    while(i > 0) {
      res = fun(res)
      i  -= 1
    }
    res
  }

  def length            = Int.MaxValue  // ?
  override def isEmpty  = false  
  override def tail     = InfSeq(fun(head), fun)
  override def toString = take(4).mkString("InfSeq(", ",", ",...)")
}

Ex.

val i = InfSeq[Int](2, _ * 2 + 3)
i.take(4).foreach(println)

Obviously that doesn't solve the map, filter, etc. functionality yet. But if you are careful to use .view, you should be fine:

val j = i.view.map(_ * 0.5)
j.take(4).foreach(println)
0__
  • 66,707
  • 21
  • 171
  • 266