Treating an SQL ResultSet like a Scala Stream

Question

When I query a database and receive a (forward-only, read-only) ResultSet back, the ResultSet acts like a list of database rows.

I am trying to find some way to treat this ResultSet like a Scala Stream. This will allow such operations as filter, map, etc., while not consuming large amounts of RAM.

I implemented a tail-recursive method to extract the individual items, but this requires that all items be in memory at the same time, a problem if the ResultSet is very large:

// Iterate through the result set and gather all of the String values into a list
// then return that list
@tailrec
def loop(resultSet: ResultSet,
         accumulator: List[String] = List()): List[String] = {
  if (!resultSet.next) accumulator.reverse
  else {
    val value = resultSet.getString(1)
    loop(resultSet, value +: accumulator)
  }
}

Could you use an Iterable instead of a Stream to do what you want? — Leif Wickland, Mar 09 '12 at 17:19
Also a stream will retain the values in memory anyway so you wont actually save memory by the time you reach the end of the list. — Richard Todd, Jul 30 '13 at 13:16
I think without a jdbc flag/option that makes jdbc itself stream the results, you still have one full copy of the data in memory, built by your jdbc api. — matanster, Mar 08 '16 at 20:57

score 78 · Accepted Answer · answered Mar 09 '12 at 17:50

78

I didn't test it, but why wouldn't it work?

new Iterator[String] {
  def hasNext = resultSet.next()
  def next() = resultSet.getString(1)
}.toStream

answered Mar 09 '12 at 17:50

elbowich

1,941
1
13
12

That looks perfect. I'll test it as soon as I get my database set up. I don't even think I need to convert it to a `Stream`. I can apply `map`, `filter`, etc. directly to it. – Ralph Mar 10 '12 at 13:43
1

I would like to give you a second up-vote. I've added this code fragment to my Scala snippets library. It's quickly becoming one of my favorites. – Ralph Mar 22 '12 at 11:56
8

It's a cool solution but I worry. I think the usual contract of `Iterator` is that `hasNext` is side-effect-free. It could be called any number of times between two calls to `next`. Is there something preventing this from becoming an issue? – Daniel Darabos Jan 25 '16 at 18:01
Good answer , but what is the actual implementation ? – Yordan Georgiev Oct 14 '16 at 09:52
1

Didn't work for me with `mysql-connector-java` version 6. Not sure if I did anything wrong, but my `ResultSet` got closed on the second `next()` call, so I could only retrieve one result row. The only way it's not auto-closed before I got all rows seems to be using `while (rs.next()) {...}`, so I add items individually to a `scala.collection.mutable.ListBuffer` within the `while`. Doesn't seem pretty, but couldn't figure out any other way. – Nick Apr 07 '17 at 11:16
1

@Nick Using `new Iterator[String]{ ... }.toList` instead of `.toStream` will fetch the entire set of results immediately, instead of just the first row. – steinar Aug 22 '17 at 18:22
This is converting one column from rs into stream. Is there a way of converting a rs with multiple columns into one array/list with multi dimensions? – davidzxc574 Jul 29 '20 at 11:41
Same here, ResultSet got closed on the second next() call, It allows to get only first record – Capacytron Nov 14 '22 at 21:26

hraban · Answer 2 · 2016-08-25T16:07:18.353

Utility function for @elbowich's answer:

def results[T](resultSet: ResultSet)(f: ResultSet => T) = {
  new Iterator[T] {
    def hasNext = resultSet.next()
    def next() = f(resultSet)
  }
}

Allows you to use type inference. E.g.:

stmt.execute("SELECT mystr, myint FROM mytable")

// Example 1:
val it = results(stmt.resultSet) {
  case rs => rs.getString(1) -> 100 * rs.getInt(2)
}
val m = it.toMap // Map[String, Int]

// Example 2:
val it = results(stmt.resultSet)(_.getString(1))

score 10 · Answer 3 · answered Sep 29 '16 at 16:31

This sounds like a great opportunity for an implicit class. First define the implicit class somewhere:

import java.sql.ResultSet

object Implicits {

    implicit class ResultSetStream(resultSet: ResultSet) {

        def toStream: Stream[ResultSet] = {
            new Iterator[ResultSet] {
                def hasNext = resultSet.next()

                def next() = resultSet
            }.toStream
        }
    }
}

Next, simply import this implicit class wherever you have executed your query and defined the ResultSet object:

import com.company.Implicits._

Finally get the data out using the toStream method. For example, get all the ids as shown below:

val allIds = resultSet.toStream.map(result => result.getInt("id"))

Are you sure it works? It fails on DB2 with ResultSet being closed. If this worked in your case perhaps it depends on the specific database brand and/or configuration? — Sergio Pelin, Aug 31 '18 at 09:32
It does but you can only use the stream as long as your connection remains open. If you close your connection, the stream will fail, as will the iterator. — Jeroen Minnaert, Sep 05 '18 at 02:25

score 3 · Answer 4 · answered Aug 19 '14 at 19:40

i needed something similar. Building on elbowich's very cool answer, I wrapped it a bit, and instead of the string, I return the result (so you can get any column)

def resultSetItr(resultSet: ResultSet): Stream[ResultSet] = {
    new Iterator[ResultSet] {
      def hasNext = resultSet.next()
      def next() = resultSet
    }.toStream
  }

I needed to access table metadata, but this will work for table rows (could do a stmt.executeQuery(sql) instead of md.getColumns):

 val md = connection.getMetaData()
 val columnItr = resultSetItr( md.getColumns(null, null, "MyTable", null))
      val columns = columnItr.map(col => {
        val columnType = col.getString("TYPE_NAME")
        val columnName = col.getString("COLUMN_NAME")
        val columnSize = col.getString("COLUMN_SIZE")
        new Column(columnName, columnType, columnSize.toInt, false)
      })

If you don't need to go back on the stream (e.g., forward iteration only), you can just use an iterator. This greatly reduces the memory overhead of using a stream (return an `Iterator[ResultSet]`, and drop the `toStream`) — Greg, Sep 15 '14 at 17:32

Brendan · Answer 5 · 2016-08-15T19:20:10.757

Because ResultSet is just a mutable object being navigated by next, we need to define our own concept of a next row. We can do so with an input function as follows:

class ResultSetIterator[T](rs: ResultSet, nextRowFunc: ResultSet => T) 
extends Iterator[T] {

  private var nextVal: Option[T] = None

  override def hasNext: Boolean = {
    val ret = rs.next()
    if(ret) {
      nextVal = Some(nextRowFunc(rs))
    } else {
      nextVal = None
    }
    ret
  }

  override def next(): T = nextVal.getOrElse { 
    hasNext 
    nextVal.getOrElse( throw new ResultSetIteratorOutOfBoundsException 
  )}

  class ResultSetIteratorOutOfBoundsException extends Exception("ResultSetIterator reached end of list and next can no longer be called. hasNext should return false.")
}

EDIT: Translate to stream or something else as per above.

score 2 · Answer 6 · answered Mar 13 '19 at 20:38

2

Iterator.continually(rs.next())
  .takeWhile(identity)
  .map(_ => Model(
      id = rs.getInt("id"),
      text = rs.getString("text")
   ))

answered Mar 13 '19 at 20:38

Sergey Alaev

3,851
2
20
35

score 1 · Answer 7 · answered Oct 30 '19 at 09:31

Here is an alternative, similar to Sergey Alaev's and thoredge's solutions, for when we need a solution which honors the Iterator contract where hasNext is side-effect free.

Assuming a function f: ResultSet => T:

Iterator.unfold(resultSet.next()) { hasNext =>
  Option.when(hasNext)(f(resultSet), resultSet.next())
}

I've found it useful to have as map "extension method" on ResultSet.

implicit class ResultSetOps(resultSet: ResultSet) {
    def map[T](f: ResultSet => T): Iterator[T] = {
      Iterator.unfold(resultSet.next()) { hasNext =>
        Option.when(hasNext)(f(resultSet), resultSet.next())
      }
    }
  }

score 0 · Answer 8 · answered May 22 '18 at 22:15

This implementation, although longer and clumsier it is in better correspondence with the ResultSet contract. The side-effect has been removed from hasNext(...) and moved into next().

new Iterator[String] {
  private var available = resultSet.next()
  override def hasNext: Boolean = available
  override def next(): String = {
    val string = resultSet.getString(1)
    available = resultSet.next()
    string
  }
}

score 0 · Answer 9 · answered Dec 07 '18 at 15:08

I think most of above implementations has a nondeterministic hasNext method. Calling it two times will move cursor to the second row. I would advise to use something like that:

  new Iterator[ResultSet] {
    def hasNext = {
      !resultSet.isLast
    }
    def next() = {
      resultSet.next()
      resultSet
    }
  }

satyagraha · Answer 10 · 2020-07-19T17:02:47.647

0

Another variant on the above, which works with Scala 2.12:

implicit class ResultSetOps(resultSet: ResultSet) {
 def map[T](f: ResultSet => T): Iterator[T] =
  Iterator.continually(resultSet).takeWhile(_.next()).map(f)
}

edited Jul 19 '20 at 17:02

answered Jul 19 '20 at 16:11

satyagraha

623
7
11

Treating an SQL ResultSet like a Scala Stream

10 Answers10

Linked