What is the correct way to fetch large number of rows from DB through Helidon DBClient

Question

What is the DBClients recommended way to work with a large number of rows?

I am currently using the execute() API on DbStatement that returns Multi. Will this download the whole universe into JVM memory or does it internally stream in batches? If it is paging/batching the resultset, is there some API that should be used to hint the fetch size?

--

jOOQ exposes Settings.setFetchSize to globally hint the fetch size for all jOOQ queries, which I believe is directly bound to JDBC Statement’s setFetchSize API.

Does DBClient have any similar setup?

score 3 · Accepted Answer · answered Nov 17 '22 at 05:54

DBClient does not expose any API to tweak the JDBC fetchSize on statements and resultsets. So it all depends on the underlying DB driver implementation.

If I dig deeper, there is much more to this than just setting the fetchSize. The memory characteristics are influenced more by how you consume the returned values from DBExecutor than how DBClient manages its resultset internally.

Multi implements Flow.Publisher and hence reactively processes the result set as requested by the subscriber. Internally, the DBClient publisher iterates over the ResultSet, calling onNext() on each row.

Now it all comes down to the returned rows are consumed downstream.

dbClient.execute(
       dbExecute -> dbExecute.createQuery(sql).execute()
).map(dbRow -> <doSomething>)

The above example is safe and does not lead to linear memory buildup, as long as the operators are all intermediate and operate only with the items passed down.

But using terminal operators such as:

    dbClient.execute(dbExecute -> dbExecute.createQuery(sql).execute())
            .map(dbRow -> <doSomething>)
            .collectList()

could obviously blow out memory as it terminates resulting in a collection. So DBClient is not a factor in the above case, rather it is the downstream operators.

From the OJDBC docs, it appears the default fetchSize is 10 rows, but that is a moot case.

score 1 · Answer 2 · answered Nov 17 '22 at 20:29

As stated above this is dependent on the terminal operation and also on exactly what doSomething does. If doSomething processes each row synchronously then you get back pressure and no unbounded consumption of resources. On the other hand if doSomething does the work asynchronously (by dumping into a thread pool or adding to a queue) then you have the potential to exhaust resources.

If the terminal operation is subscribe() then the user code has full control over back pressure and could conceivably do the work asynchronously by metering data using Subscription.request()

What is the correct way to fetch large number of rows from DB through Helidon DBClient

2 Answers2