0

A select * on a large (14 million row) table issued by our hand-crafted Java object relational manager often hangs. Basically, the code issues this query:

import org.mountsinai.foo.Entity;
import org.springframework.jdbc.core.RowMapper;
import java.sql.ResultSet;
import org.springframework.jdbc.core.namedparam.NamedParameterJdbcTemplate;

// aDataSource is defined ...
NamedParameterJdbcTemplate jdbcTemplate = new NamedParameterJdbcTemplate(aDataSource);

RowMapper<Entity> entityRowMapper = (ResultSet rs, int rowNum) -> {
    Entity entity = new Entity();
    entity.setDatabaseId(rs.getInt("id"));
    entity.setEntityId(rs.getString("entity_id"));
    entity.setFound(rs.getBoolean("found"));
        // Set 10 more fields ...
    return entity;
};
jdbcTemplate.query("select * from table;", entityRowMapper);

Thus we're using NamedParameterJdbcTemplate's query(String sql, RowMapper rowMapper) method. We use the com.microsoft.sqlserver:mssql-jdbc JDBC library.

The query goes to an SLQ Server database. Sometimes everything works, especially when the table is small. But when the table is large (14 million row), jdbcTemplate.query often hangs. In this circumstance, an SQL Server Profiler trace of the database shows that the query finishes, as we see

SP:StmtCompleted
RPC:Completed

EventClasses for the it in the trace. No exception is thrown at the client.

I'm thinking of trying to fix this problem by replacing the code which fails intermittently with a findAll call that uses Pageable in the PagingAndSortingRepository interface. Would you expect that to solve the problem?

Arthur

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348
Arthur
  • 525
  • 7
  • 18
  • What is it you are doing with the result? Do you really need all results in memory? – M. Deinum Feb 23 '23 at 15:15
  • 1
    It's probably hangs when fetching the data. Or perhaps on GC if you have memory pressure. You can run JMC to see what your application threads are busy with. 10 mils rows x 10 fields is quite a lot of data, considering java objects usually take 24 bytes each and you have some strings there as well. – siggemannen Feb 23 '23 at 15:34

1 Answers1

1

Doing pagination might solve the problem, but you don't have to use JPA for this, nor would it help.

As discussed in the comments your problem is most likely that storing 14 million instances in memory needs a lot of memory. And of course, constructing those takes a lot of time.

Therefore pagination would help by breaking the process down into chunks. But you can do that easily with the current approach by simply incorporating offset and limit into the query. See https://stackoverflow.com/a/9261762/66686

But pagination can become slow on large tables and it might be more efficient to use key based pagination. Where you select n rows per batch ordered by primary key and on the next batch select the first n rows after the last id value of the previous batch.

An alternative that might work even better is to use a RowCallbackHandler instead of a row mapper and fully process a row, before handling the next one. This minimises the data kept in memory, while at the same time runs only a single query. Of course, it depends on what you try to do with all the data if this is a feasible approach.

If you go for pagination you might want to look into Spring Batch which is intended for this kind of processing.

JPA would likely just increase the cost in memory and performance per row. It is generally not well suited for such large amounts of data.

Jens Schauder
  • 77,657
  • 34
  • 181
  • 348