So it turns out that the crux of the problem is that by default, Postgres starts in "autoCommit" mode, and also it needs/uses cursors to be able to "page" through data (ex: read the first 10K results, then the next, then the next), however cursors can only exist within a transaction. So the default is to read all rows, always, into RAM, and then allow your program to start processing "the first result row, then the second" after it has all arrived, for two reasons, it's not in a transaction (so cursors don't work), and also a fetch size hasn't been set.
So how the psql
command line tool achieves batched response (its FETCH_COUNT
setting) for queries, is to "wrap" its select queries within a short-term transaction (if a transaction isn't yet open), so that cursors can work. You can do something like that also with JDBC:
static void readLargeQueryInChunksJdbcWay(Connection conn, String originalQuery, int fetchCount, ConsumerWithException<ResultSet, SQLException> consumer) throws SQLException {
boolean originalAutoCommit = conn.getAutoCommit();
if (originalAutoCommit) {
conn.setAutoCommit(false); // start temp transaction
}
try (Statement statement = conn.createStatement()) {
statement.setFetchSize(fetchCount);
ResultSet rs = statement.executeQuery(originalQuery);
while (rs.next()) {
consumer.accept(rs); // or just do you work here
}
} finally {
if (originalAutoCommit) {
conn.setAutoCommit(true); // reset it, also ends (commits) temp transaction
}
}
}
@FunctionalInterface
public interface ConsumerWithException<T, E extends Exception> {
void accept(T t) throws E;
}
This gives the benefit of requiring less RAM, and, in my results, seemed to run overall faster, even if you don't need to save the RAM. Weird. It also gives the benefit that your processing of the first row "starts faster" (since it process it a page at a time).
And here's how to do it the "raw postgres cursor" way, along with full demo code, though in my experiments it seemed the JDBC way, above, was slightly faster for whatever reason.
Another option would be to have autoCommit
mode off, everywhere, though you still have to always manually specify a fetchSize for each new Statement (or you can set a default fetch size in the URL string).