I am using jdbcTemplate to query to an ETL tool and fetch the results and write to an Excel Worksheet using Apache POI library. I have somewhere between 700,000 and 1,000,000 rows to fetch and write.
Earlier I had the following setup:
jdbcTemplate.query(new CustomPreparedStatementCreator(arg), new ResultSetExtractor<Void>() {
@Override
public Void extractData(ResultSet rs) {
while (rs.next()) {... process sequentially ...}
// caused the connection to remain open for 650 seconds
// setFetchSize was not set so it was default
}
Now I have the following setup:
jdbcTemplate.setFetchSize(200);
SqlRowSet srs = jdbcTemplate.queryForRowSet(sql);
// all the rows are fetched in 60 seconds into memory now
while (srs.next()) {... process sequentially ...} // still takes 600 seconds to write to the worksheet
Now my problems are:
- How to I leverage the Java 8 parallelStream API to process all the rows inside a SqlRowSet object? Is there any alternative?
- How can I write faster to the worksheet (an org.apache.poi.xssf.streaming.SXSSFSheet) ?