4

I'm migrating a project from Scala 2.12.1 to 2.13.6, and find that SeqView#flatMap now returns a View, which doesn't have a distinct method. I thus have one bit of code that does not compile anymore:

val nodes = debts.view
      .flatMap { case Debt(from, to, _) => List(from, to) }
      .distinct
      .map(name => (name, new Node(name)))
      .toMap

There's a dumb way to fix it, by converting the view to a seq and then back to a view:

val nodes = debts.view
      .flatMap { case Debt(from, to, _) => List(from, to) }.toSeq.view
      .distinct
      .map(name => (name, new Node(name)))
      .toMap

However, this is obviously not great because it forces the view to be collected, and also it's just super inelegant to have to go back-and-forth between types. I found another way to fix it, with is to use a LazyList:

val nodes = debts.to(LazyList)
      .flatMap { case Debt(from, to, _) => List(from, to) }
      .distinct
      .map(name => (name, new Node(name)))
      .toMap

Now that's what I want, it basically behaves like a Java stream. Sure, some operations have O(n) memory usage like distinct, but at least all operations after it get to be streamed, without reconstructing the data structure.

With this, it gets me thinking about why we should ever need a view, given that they're much less powerful than before (even if I can believe 2.13 has fixed some other issues this power was introducing). I looked for the answer and found hints, but nothing that I found comprehensive enough. Below is my research:

It might be me, but even after reading these references, I don't find a clear upside in using views, for most if not all use cases. Anyone more enlightened than me?

Dici
  • 25,226
  • 7
  • 41
  • 82
  • This looks like an oversight. SeqView#flatMap should return a SeqView, like Seq#flatMap returns a Seq. You should open a bug in the GitHub – SwiftMango Nov 17 '21 at 02:53
  • Mm I assumed it was because they had to limit how precise the types were to simplify the library compared to before where it had gotten complicated. I'll consider filing a bug just to see! – Dici Nov 17 '21 at 10:35
  • About the bug report, it's sort of similar to this: https://github.com/scala/bug/issues/11159 – Dici Nov 17 '21 at 14:21

1 Answers1

5

There are actually 3 basic possibilities for lazy sequences in Scala 2.13: View, Iterator and LazyList.

View is the simplest lazy sequence with very little additional costs. It's good to use by default in general case to avoid allocations for intermediate results when working with large sequences.

It's possible to traverse the View several times (using foreach, foldLeft, toMap, etc.). Transformations (map, flatMap, filter, etc.) will be executed separately for every traversal. So care has to be applied either to avoid time-consuming transformations, or to traverse the View only once.

Iterator can be traversed only once. It's similar to Java Streams or Python generators. Most transformation methods on Iterator require that you only use the returned Iterator and discard the original object.

It's also fast like a View and supports more operations, including distinct.

LazyList is basically a real strict structure, which can be expanded automatically on the fly. LazyList memoizes all the generated elements. If you have a val with a LazyList, the memory will be allocated for all the generated elements. But if you traverse it on the fly and don't store in a val, the garbage collector can clean up the traversed elements.

Stream in Scala 2.12 was considerably slower than Views or Iterators. I'm not sure if this applies to LazyList in Scala 2.13.


So every lazy sequence has some caveat:

  • View can execute transformations multiple times.
  • Iterator can be consumed only once.
  • LazyList can allocate the memory for all the sequence elements.

In your use case I believe, it's Iterator that's the most appropriate:

val nodes = debts.iterator
      .flatMap { case Debt(from, to, _) => List(from, to) }
      .distinct
      .map(name => (name, new Node(name)))
      .toMap
Kolmar
  • 14,086
  • 1
  • 22
  • 25
  • Right, that makes sense! I didn't think about using an iterator here because of how cluttered my mind is with Java, but it is indeed the closest thing to it conceptually. I didn't quite understand your comment about `val` and `LazyList`, could you elaborate on this? – Dici Nov 17 '21 at 10:43
  • @Dici It's from the LazyList Javadoc: "One must be cautious of memoization; it can eat up memory if you're not careful. That's because memoization of the `LazyList` creates a structure much like `List`. As long as something is holding on to the head, the head holds on to the tail, and so on recursively. If, on the other hand, there is nothing holding on to the head (e.g. if we used def to define the `LazyList`) then once it is no longer being used directly, it disappears." – Kolmar Nov 17 '21 at 13:11
  • Ok so by `val` you meant somebody retaining a reference to the `LazyList`. That makes sense! I'll accept the answer. – Dici Nov 17 '21 at 14:19