2

I'm working on a problem in Scala where I have two input Seq[Int], A and B, and an value function f and the goal is to find the max value of f for any pair of elements (a, b) in A and B.

We can do this naively with a for-comprehension:

val A = Seq(1, 2, 3, 4, 5)
val B = Seq(6, 7, 8, 9, 10)
def f(a: Int, b: Int): Int = a * b // f can be more complex than just a multiplication. This is just an example. 

val results = for {
  a <- A
  b <- B
} yield f(a, b)

results.max

The problem that I'm running into is that A and B can be very large and computing every combination of (a, b) is causing memory errors. I don't actually care about the entire list of (a, b) or even the entire list of results either. I only really care about the maximum value.

Is is possible to stream the list of values produced by the for-comprehension into f and then into the max function so that I don't have to hold the entire list in memory?

Notes

  • The problem that I'm working on is Leetcode #11 - Container with Most Water. I know about the linear time solution, but what I'm really curious about here is whether we can stream a logical sequence of values into an aggregator function in Scala.
  • In my research, I found scala Streams, which look like the previous implementation of the current LazyList class, but these don't look like they will give me the behavior that I want. I don't think that 'stream' is the right word to describe the behavior I want in scala, but I'm not sure what to search for next.
Brad Carter
  • 45
  • 1
  • 5
  • 1
    Is [11. Container With Most Water](https://leetcode.com/problems/container-with-most-water/) the problem that you are trying to solve? It doesn't look like the one you are detailing here. Both problems looks that they can be solved using [fold](https://en.wikipedia.org/wiki/Fold_(higher-order_function)). The article [Folding Lists in Scala](https://www.baeldung.com/scala/folding-lists) might help you. – Gastón Schabas Jul 07 '23 at 15:12
  • 1
    Does [this here](https://stackoverflow.com/questions/6799648/in-scala-what-does-view-do) answer your question? Just use `A.view` and `B.view`. – Andrey Tyukin Jul 07 '23 at 15:48
  • 1
    Yeah use a lazy collection like `View`, `Iterator`, `LazyList` – Luis Miguel Mejía Suárez Jul 07 '23 at 18:48
  • Not `LazyList` (the OP is correct, it doesn't help this case), but a view or an iterator would do it – Dima Jul 08 '23 at 12:12
  • for comprehension here is just like `flatMap` and your operation doesn't match `flatMap`'s semantics, you should follow @GastónSchabas advice – michaJlS Jul 10 '23 at 17:14
  • To expand on @Dima's comment, `LazyList` doesn't just evaluate the collection lazily, but also memoizes it, meaning that it's not a good fit for this case. I haven't tested `View` but it might have the same behavior. Worth checking maybe. `Iterator` instead works as OP wanted, since the object doesn't also memoize the items which are evaluated lazily. – stefanobaghino Jul 11 '23 at 06:23
  • 1
    @stefanobaghino I have tested `View` :) Was surprised to see, that it does actually work as intended here :D. Personally, I prefer `Iterator` myself - view feels like a bit too much black magic to my taste . The advantage of a `View` here though is that you can iterate through it repeatedly if needed. – Dima Jul 11 '23 at 11:39
  • @GastónSchabas Yeah, the naive approach I was taking was ``` def maxArea(heights: Array[Int]): Int = { def surfaceArea(a: Int, b: Int): Int = { val base = (a - b).abs val height = heights(a).min(heights(b)) base * height } val indices = 0 to heights.size - 1 val volumeOfEveryPossibleContainer = for { first <- indices second <- indices if second >= first } yield surfaceArea(first, second) volumeOfEveryPossibleContainer.max } ``` – Brad Carter Jul 12 '23 at 15:51
  • This obviously isn't efficient or the correct solution, but I'm more so just curious about computing an aggregate (in this case max) of a large logical collection without having to physically instantiate it. – Brad Carter Jul 12 '23 at 15:53
  • The rest of you, thanks! It sounds like I need to go do more research into Iterators and Views. – Brad Carter Jul 12 '23 at 15:54

1 Answers1

2

I had a look at the comments and took the experimental approach. I booted up a Scala shell and ran the two following snippets of code.

Using LazyList:

val a = LazyList.continually(0).take(Int.MaxValue)
val b = LazyList.continually(0).take(Int.MaxValue)
val results = for (a <- a; b <- b) yield (a, b)
results.max

LazyList doesn't just lazily evaluate the collection, but also tries to hold on onto it, meaning that the collection will "leak" even though that's not the behavior that you were looking for. What happened in my test run was that the process never ran out of memory (although I believe it eventually would have) but there was always a little bit of memory to be reclaimed, so that there's so much garbage collection happening that the process practically stops making meaningful process. See the following VisualVM memory monitoring chart:

using a lazy list

Using Iterator:

val a = Iterator.continually(0).take(Int.MaxValue)
val b = Iterator.continually(0).take(Int.MaxValue)
val results = for (a <- a; b <- b) yield (a, b)
results.max

Iterator is consumed (i.e. there's a side-effect) lazily, but doesn't try to memoize the collection, meaning that evaluated items can be collected. This ran in a few minutes on my laptop. See in the VisualVM chart how memory is progressively consumed and then reclaimed once the "garbage" start to pile up:

using an iterator

stefanobaghino
  • 11,253
  • 4
  • 35
  • 63