I have written a Scala (2.9.1-1) application that needs to process several million rows from a database query. I am converting the ResultSet
to a Stream
using the technique shown in the answer to one of my previous questions:
class Record(...)
val resultSet = statement.executeQuery(...)
new Iterator[Record] {
def hasNext = resultSet.next()
def next = new Record(resultSet.getString(1), resultSet.getInt(2), ...)
}.toStream.foreach { record => ... }
and this has worked very well.
Since the body of the foreach
closure is very CPU intensive, and as a testament to the practicality of functional programming, if I add a .par
before the foreach
, the closures get run in parallel with no other effort, except to make sure that the body of the closure is thread safe (it is written in a functional style with no mutable data except printing to a thread-safe log).
However, I am worried about memory consumption. Is the .par
causing the entire result set to load in RAM, or does the parallel operation load only as many rows as it has active threads? I've allocated 4G to the JVM (64-bit with -Xmx4g
) but in the future I will be running it on even more rows and worry that I'll eventually get an out-of-memory.
Is there a better pattern for doing this kind of parallel processing in a functional manner? I've been showing this application to my co-workers as an example of the value of functional programming and multi-core machines.