2

I have a Spring Boot application that reads from a database table with potentially millions of rows and thus uses the queryForStream method from Spring Data. This is the code:

Stream<MyResultDto> result = jdbcTemplate.queryForStream("select * from table", myRowMapper));

This runs well for smaller tables, but from about 500 MB of table size the application dies with a stacktrace like this:

Exception in thread "http-nio-8080-Acceptor" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:64)
        at java.base/java.nio.ByteBuffer.allocate(ByteBuffer.java:363)
        at org.apache.tomcat.util.net.SocketBufferHandler.<init>(SocketBufferHandler.java:58)
        at org.apache.tomcat.util.net.NioEndpoint.setSocketOptions(NioEndpoint.java:486)
        at org.apache.tomcat.util.net.NioEndpoint.setSocketOptions(NioEndpoint.java:79)
        at org.apache.tomcat.util.net.Acceptor.run(Acceptor.java:149)
        at java.base/java.lang.Thread.run(Thread.java:833)
2023-01-28 00:37:23.862 ERROR 1 --- [nio-8080-exec-3] o.a.c.h.Http11NioProtocol                : Failed to complete processing of a request

java.lang.OutOfMemoryError: Java heap space

2023-01-28 00:37:30.548 ERROR 1 --- [nio-8080-exec-6] o.a.c.c.C.[.[.[.[dispatcherServlet]      : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space] with root cause

java.lang.OutOfMemoryError: Java heap space

Exception in thread "http-nio-8080-Poller" java.lang.OutOfMemoryError: Java heap space

As you can probably guess from the stack trace, I am streaming the database results out via a HTTP REST interface. The stack is PostgreSQL 15, the standard PostgreSQL JDBC driver 42.3.8 and the spring-boot-starter-data-jpa is 2.6.14, which results in spring-jdbc 5.3.24 being pulled.

It's worth noting that the table has no primary key, which I suppose should be no problem for the above query. I have not posted the RowMapper, because it never goes to work, the memory literally runs out after sending the query to the database. It just never comes back with a result set that the rowmapper could work on.

I have tried to use a jdbcTemplate.setFetchSize(1000) and also without specifying any fetch size, which I believe would result in the default being used (100 I think). In both cases the same thing happens - large result sets will not be streamed, but somehow exhaust the Java heap space before streaming starts. What could be the reason for this? Isn't the queryForStream method meant to exactly avoid such situations?

ulim
  • 83
  • 10

1 Answers1

1

I was on the right track setting the fetch size, that is exactly what prevents the JDBC driver from loading the entire result set into memory. In my case the setting was silently ignored and that is a function of the PostgreSQL JDBC driver. It ignores the fetch size if autocommit is set to true, which is the default in Spring JDBC.

Therefore the solution was to define a datasource in Spring JDBC that sets autocommit to false and use that datasource for the streaming query. The fetch size was then applied and I ended up setting it to 10000, which in my case yielded the best performance / memory usage ratio.

ulim
  • 83
  • 10